Future¶

loading_params = dict(
    delimiter="\t", skiprows=1, dtype='U', usecols=0, comments=None
)

biggest_impact = np.loadtxt(
    "data/2020/biggestimpact_comments_master.tsv", **loading_params
)
other_changes = np.loadtxt(
    "data/2020/significantchanges_comments_master.tsv", **loading_params
)

# Filter
biggest_impact = biggest_impact[biggest_impact != '']
other_changes = other_changes[other_changes != '']
# Re-order
rng = np.random.default_rng(0xDEADC0DE)
rng.shuffle(biggest_impact)
rng.shuffle(other_changes)
# Reporting values
glue(
    'num_biggest_impact',
    biggest_impact.shape[0],
    display=False
)
glue(
    'num_other',
    other_changes.shape[0],
    display=False
)

To conclude the survey, we asked participants to share their thoughts on what changes to NumPy would have the most significant impact for them as users.

Biggest Impact¶

We asked survey participants the following question:

What single immediate change to NumPy would bring the most value to you as a NumPy user?

The responses of the 239 survey participants who answered this question are listed below.

gen_mdlist(biggest_impact, "biggest_impacts_list.md")

Expand to see responses!

Comments
simple possibility to use different CPU cores for parallel different arrays calculating without complex sintax
Greater reach to users in general. (Original in Spanish: Mayor alcance a las usuarias en general.)
Automatic differentiation
Better error messages and error handling. Sometimes numpy has incorrect statements and includes text not intended for the error message.
User-defined scalar types (e.g. high precision floats)
consistency with xarray api, migration path to using numpy.random with Jax
More speed! I use numpy because it’s fast - I want more of that. Also articles and tips on how to use performance features (out kwarg comes to mind)
Better bug tracking (current number of open but duplicated bugs in GitHub is excessive)
Unite Numpy and Scipy in a single project / library. (Original in Spanish: Unir Numpy y Scipy en un único proyecto / librería.)
A clearly defined and smaller API that is 100% compatible with Dask, CuPy and other array libraries.
signal processing and filter design tools (lighter weight versions of the tools in scipy)
Not sure. I generally love Numpy! Maybe including the Minuit optimization package more formally into the numpy ecosystem.
a ruthless standardization on snake_case methods instead of “sometimes it’s .foobar() and sometimes it’s .foo_bar()”
documentation should be more interactive
Most of the documentation examples are extremely concise, and only illustrate one to a few uses of an API.
Easier/native reading of Fortran binary files
Full fledged typing (mypy) support, with dimensions, shape and dtype.
newcomers tutorials
Best practices
Better type annotations
Proper methods for 1:1 image display in matplotlib
Examples and documentation
small matrix optimizations
a homogeneous use of size and shape parameters :p
Versatile boundary conditions for time integration
Type hints
Parallelization
more How Tos
even better docs
COMMENTS
Improvements in performance
More examples of use with visualisation tools
I would love to see some kind of series of videos or webinars teaching numpy
Interoperability with other low level array libraries.
GPU
Documentation accessibility
Documentation improvement to add clearer examples (maybe with some visualizations)
A more intuitive dtype system.
less pickiness about what kind of sequence of things a function (array, list, tuple …) can accept (this may be a general Python issue, though.)
Clearer documentation
Add homogeneous transformations
Stop unwrapping zero-dimensional arrays into scalars.
GPU support
Easier docs for multidimensional operations (from stack/roll to whatever)
Matrix operations
Parallelism in summations. (Original in Spanish: Paralelismo en sumatorias.)
Multi-threading by default
Maybe easier way to move data between library especially deep learning.
Integration with units libraries
Plotting data easily
Better string array support and performance
More examples of its use (particularly for my specific domain).
Documentation
Clearer opportunities to give back to the community
np.nan for int arrays
Column vectors as default as opposed to 1d vectors.
Multithreaded einsum
closer coordination with pandas
Improvements in Documentation
CUDA backend like jax
As optimization libraries
More tutorials and use cases for linear algebra
Better array reps in jupyter ?
Greater integration with Python. (Original in Spanish: Mayor integración con Python.)
became a framework
Nullable integers
numpy.dot should work on arrays of shape (…,n) x (n,…).
Speed
Increased adoption by other frameworks of array_function protocol
Smaller steering committee
Type annotations. (But that’s coming, from what I hear!)
More docs
randomized linear algebra
Documentation and Examples
An API/set of hooks to allow functions like `concatenate` to create duck arrays/subclasses.
Faster FFTs.
Comprehensive documentation
I run into floating point rounding errors often, sometimes that cause large bugs. This seems to stem from np.loadtxt.
F2py handle Fortran 2008 and be thread safe
NEP 21
Integrate quaternions as a basic type. (Original in Spanish: Integrar los cuaterniones como un tipo básico.)
beginner-friendly numpy is too comp.
Easy but efficient parallelism (like Mathematica’s ParallelMap).
Stochastic linear algebra; specifically the ability to find the determinant of a LinearOperator. This might fit better in SciPy than NumPy
why zeros() uses shape, while randint() uses size? I always forget which is which
Tools for simulations (Monte Carlo method, generation of random numbers). (Original in Spanish: Herramientas para simulaciones (método Monte Carlo, generación de números aleatorios). )
Creation of a reliable binary format storage option.
Faster masked arrays
As a user, probably improved clarity/consistency of the documentation
Increase the types of special functions. (Original in Japanese: 特殊関数の種類を増やします.)
Static typing
Better native GPU integration
faster small arrays
changing numpy’s name to np!!!
Better documentation of packages.
cleaner separation with scipy of FT - 2 equivalent modules with differences !
autodiff [but I am not sure I’d want it in numpy!]
copy vs original array specifications
GPU backend :)
Fixing masked array and making them “first-class citizens”.
Something Community-wise, I’m sure.
new nditer C API support in Cython
Support for hardware accelerators
Performance comparable to rust ndarray (in my experience when used correctly Rust ndarray is ~2x faster)
Add an HOWTO example to documentation on how to extend Numpy with a gufunc written in Cython
this is a big ask, but it would be nice if numpy could run on gpus
processing speed. (Original in Japanese: 処理速度.)
A document describing best practices for using NumPy for scientific computing, targeted to a researcher using the package with limited computer science knowledge.
Efficiency
integer calculations like prime factorization
how ndarray is displayed as a 2D list not matrix
Adding the option to use more functions as methods. For many operations (sum, max, argmax, real/imag…) we have the option to access them as functions or methods. Many others (abs, angle, diff, sin, cos…) can only be used as functions. Sometimes it would be cleaner to write code with these as methods.
Some statistical or linear algebra functions are both in scipy and numpy, this is confusing for me.
Defining a numpy array with a string index
optional parallel computing
even better integration with other toolkits
It’s hard to say. Numpy is probably my favorite library ever. If I was forced, maybe a course developed for both new and intermediate users.
more robust documentation
Improved performance. (Original in Japanese: パフォーマンスの向上.)
GPU usage
more examples in the wild. This is largely out of numpy’s control though.
ability to run on GPU.
More tutorials
Make numpy.unique() handle objects arrays containing None again - this was working in Py2.
anything to help bring people update with confidence - the BDFL for my project still uses py27, and it’s been pulling teeth to get him to 1.14. Even though I use 1.16 and 1.18 depending on the project, he is nervous because he remembers ‘that one time numpy changed how views are handled’
Providing the features that would stop PyTorch, TensorFlow, JAX, etc. from reimplementing NumPy and fragmenting the ecosystem. I think this is only CPU/GPU transparency (i.e. absorb CuPy). We don’t want to go back to the days of Numeric vs NumArray!
Tutorials and more examples in documentation
Improved seo that puts the latest docs at the top of searches. I frequently will end up with links to 1.15 docs
portability
Custom dtypes
linear interpolation along an axis
Be able to transform a 1D horizontal array to a 2D vertical array with the ‘.T’ transform.
Better documentation. Examples. And explanation of underlying logic. It’s already good. But always could be better
Speed and ease of use. (Original in Spanish: Velocidad y facilidad de uso.)
More readable documentation would be welcome but otherwise NumPy is awesome!
np.unique should accept a tolerance keyword that treats floats as the same if they differ by less than the tolerance.
best practice and performance comparison of optimal/sub-optimal usages, and tutorial/documentation in this direction
Documentation (tutorials)
This might not be possible but having a fast way to iterate through arrays in a python for loop would make some operations easier.
I can proudly say all the improvements I want to see (in things like docs) would be large, no low hanging fruit.
Manipulation of ndarrays (indexing into, reshaping, etc.) could stand to be a little more transparent.
more error messages for debugging
I usually switch away from Numpy when my arrays contain strings. Perhaps there is a better way?
Mentorship (stronger involvement in NumPy). Some less used features are completely unknown to me and it is hard to find tutorials/materials on them besides the documentation.
Multithreaded functions
Better documentation
It would be really nice to have an api from numpy that evaluated the performance costs/benifits between different function calls with some input data, (like np.mat vs np.array, or np.dot vs np.einsum). It would make it easier to compare and see what I should be using in a specific case
More functionalities for images 2d and 3d
Names dimensions
Low level explainations
more documentation for advanced users for maximum performance
I would like an explicit pointer syntax
Static type hints
Performance
Ragged arrays/dtypes
Working with JAX to add the numpy protocols. Then I can really use either library however I want!
N-D linear interp
Adding a “a.b” notation for dot products
low-level parallel computing
FASTER
Clear and concise concatenation of 1D arrays to form a shape (N, 2) array. Currently using `np.vstack((…)).T’.
Some finances module, but other than that is awesome as it is now
Better tutorials and or easier way to create ufuncs
rational number support with arbitrary capacity (int8, int16, etc). Need this for chemical stoichiometry calculations, specifically for calculating nullspace of stoichiometry matrix.
Make the API reference less ad-hoc. See the Java docs for the ideal model.
Faster multi-threaded operations (but this is out of scope and I’m happy using other libraries)
[honestly it’s perfect]
labelled arrays
CUDA integration…
A place for writing and submitting tutorials on how to implement things in numpy, and ways to link numpy functions to these tutorials.
Support to visualize data (matplotlib often too complicated)
GPU usage
optimization
Parallelization features
Better documentation of linear algebra wrappers
JIT
separate the C code from the python code: less extensive use of the CPython C-API
More visualization tools
Support for type annotations
NEP-35 and NEP-37 widespread adoption
.index() … I’ve been seriously considering dropping numpy entirely in favour of pytorch over this, and frankly given how long it’s been I think it might be prudent to do so even if numpy added .index() today.
More and better examples of using Numpy with more realistic data. (Original in Spanish: Más y mejores ejemplos de uso de Numpy con datos más realistas.)
Alternatives to very large arrays (memory error). (Original in Spanish: Alternativas a arrays muuuy grandes (memory error).)
Contract Simplification (mainly the sugar side of things)
Weighted quantiles. I’m working on it
Packaging of mkl libraries other than conda (wheel). (Original in Japanese: conda以外のmklライブラリのパッケージ化(wheel).)
CUDA
GPU support
Better modern Fortran support in f2py
A more user friendly vector class for linear algebra
Synchronization between numpy.linalg and scipy.linalg.
Consistent null value handling bumpy array
Easier to understand documentation
Better performance (paralelization)
(py)FFTW backend
Updated documentation for f2py
A more consistent API, perhaps? (Original in Portuguese: Uma API mais consistente, talvez?)
Add a way to keep track of units and to display answers with units
documentation
Documentation
Usability. Make it simpler to use
More speed ;)
Performance boosts using inherent parallelism.
Have a better documentation and tutorials.
Better examples on doc pages. Almost always I have to check stackoverflow to understand the function better.
Better control of array memory.
Language-independent API
Performant vectorisation
A clarification in the function documentation to quickly know if it works in view or in copy. (Original in French: Une clarification dans la documentation des fonctions pour savoir rapidement si elle travaille en vue ou en copie.)
Easy Documentation.. New learning is difficult with the current documentation model
An easier way to handle arrays larger than memory
better documentation, with more examples and use cases.
Give more examples along with the documentation, give use cases, redesign docs page
More integration with numba jit & cuda
Better tutorial/documentation on how to efficiently use numpy features (ufunc etc.)
More extensive and tutorial like documentation like stack overflow is with a continuous example
support NA/missing values
Increased random support. (Original in Spanish: Mayor soporte de random.)
Why do you speak in feminine? (Original in Spanish: Por que habláis en femenino?)
Codifying a “minimal NumPy”
Would love a feature to extract both the min and max of an array (with an optional axis parameter) in one stride
GPU
Multithreaded 2 and 3 dimensional FFTs
Adding the feature I requested
Making faster. Python is inefficient and Numpy does not help by default.
Better alternative for SWIG to wrap a proprietary I/O library written in C++
I think your masked array implementation is kind of clunky. The relationship between the mask and the underlying data array can get confusing. In particular, the behavior of the fill value is confusing. Setting something to the fill value in the data array doesn’t change the mask. Changing the mask doesn’t seem to update the data array. It’s been a while since I’ve had to deal with this issue, but it can get confusing.
Allowing users to perform operations with one dimension removed. Eg adding a matrix of (3,4) to a vector of shape (3,)
I would like documentation in Spanish in the most complex areas. (Original in Spanish: Me gustaria documentacion en español en las areas mas complejas.)
Clearer separation between numpy and scipy in overlapping domains (linalg comes to mind)
In-built visualization support for NumPy arrays. Would make it easier to visualise high dimensional arrays.
allowing to slice an array with another array
numpy <—> netCDF examples. I know how to do it, but “exchange” between formats would be better documented
more hand-on with simple level 100 to 500
Way to access specific parts of the library since putting numpy in production is heavy. (Original in Spanish: Manera de acceder a partes específicas de la librería ya que poner numpy en produccion es pesado.)
Improve performance
ONNX support
Became more Developer friendly
GPU acceleration

Other Significant Changes¶

Finally, we asked participants to share any other changes that would significantly improve NumPy. The responses of the 110 participants who answered this question are listed below.

gen_mdlist(other_changes, "other_changes_list.md")

Expand to see responses!

Comments
adaptability
Documentation in Spanish
I do not understand why np.random.rand(10,10) is okay, but np.ones/zeros(10,10) is not.
Direct GPU support, without using other packages like nuba
kill pandas
Javascript API AND GPU support
GPU support. Other languages support (Rust, PHP, Javascript)
built in parallelization over machines
Restructure and simplify the documentation
Reduction on the size, splitting parts of niche modules into stand alone projects
Type annotations, they help a lot to streamline development in supporting IDEs
GPU adaption, optimisation tools
Add static types
I like this idea of a mentorship program to contribute :)
Ability to run in low computing resource environments. (Original in Spanish: Capacidad de correr en entornos de bajos recursos computacionales.)
Memory mapped numpy array and support B for custom hardware
Capacity improvements and interpolation. (Original in Spanish: Mejoras en la capacidad e interpolación.)
Non pickle serialization
More communication on quality and usage
user gufuncs
Tutorials to improve performance without being an expert C or Fortran user. (Original in Spanish: Tutoriales para mejorar rendimiento sin ser usuario experto de C o Fortran.)
Although this is probably not too feasible, some limited ability to write “non-vectorised” multi-line maths/logic in simple loop-like structures (e.g. where perhaps tools like Numba might be overkill) and have it optimise the loop overhead (e.g. like some sort of local static typing) would be quite nice to see.
make loadtxt and savetxt more symmetrical and flexible
Automatic differentiation
Cython interaction is sometimes awkward (should I use Cython memoryviews or an ndarray?)
Tighter integration / support for numpy extensions like numba and cupy
From my point of view, numpy is most important as the base on which other scientific packages are built. The ones I use most are scipy.stats, astropy, and pandas. In that respect, I don’t see a need for significant changes in numoy
More speed ;)
Implicit support for specialized compute hardware (GPU)
A syntax more similar to that of R. (Original in Portuguese: Uma sintaxe mais similar a do R.)
Nothing: NumPy is excellent!
Document the errors
clearer documentation of every array manipulations
publish a definitive set of APIs
Real compatibility with PyPy
Better documentation
I love numpy, thanks for all your hard work!
GPU support would speed up some computations on large arrays.
Faster imports, less reliance on conda-forge
More widespread adoption in universities.
MICROPYTHON
array_module
Portability and independent
Support for cuda or OpenCL technologies should be more transparent. (Original in Spanish: El soporte para tecnologías tipo cuda u OpencL debería ser más transparente.)
What’s the deal with numpy.matrix?
more elaborate examples
Support for Mypy
and parallel and distributed capabilities, not sure if they already available.
*beginner-friendly numpy is too comp.
Better type system similar to that of Julia
Soliciting materials for tutorials from well versed users
Better IDE Autocomplete
outer products for vectors are still quite confusing
Expanding the number of interpolation methods. (Расширение количества методов интерполяции
Maybe some graphics in the website to clarify multidimensional operations
Something like Google’s JAX, with differential functionality and better speed.
more transparent numba (or numba like) integration
clear up the confusing situation with matrices and arrays or at least explain it more thoroughly
Differential equations
Just support for type annotations
Cleaner C++ interface
Data types of missing values other than numeric types. (Original in Japanese: 数値型以外の欠損値のデータ型.)
Adding autodiff, enabling named tensor axes (even if not “labelled”, at least “named”, to keep tensor dimensions’ semantic meaning straight)
GPU support
Modern, more readable Sphinx theme. Deprecate unused or out of scope parts of the library.
I know you’ll never do it, but the matmul operator is nearly useless. I have no control as to whether the user passes in a matrix or scalar, and in many instances both are entire correctly. So my code is still only using dot. I’ve read the arguments of ‘only one way’ and it’s bogus. We don’t have different operators to add int int vs float float, nor should we. So I have dot(F, P).dot(Q) everywhere instead of F@P@Q. It is so hard to read, and prone to error (F.dot(P) can fail if F is not a matrix). I initially went through a bunch of code replacing dot with @ but had to get rid of them all because I was getting endless exceptions. This is sw to do math - please let us express it in a reasonable manner! And make these comment boxes bigger! A one line box to enter a ‘significant’ change?
Automatic differentiation
More understanding of low level functions
Performance tests related to the different ways of programming with Numpy. (Original in Spanish: Pruebas de rendimiento relacionadas con las distintas formas de programar con Numpy.)
view ndarray as matrix like in matlab or R
distributed computing
sincos implementation, exp(1j * x) implementation, vectorized transcendentals (e.g., using Intel SVML/MKL for exp, sin, etc)
enhancements to object arrays
Custom FFT kernels
Better coordination of array-like types throughout the scientific ecosystem
Native numexpr-ike functionality
Hpy api for integration with pypy
Optional auto-parallelization
Arbitrary precision and the addition of physical constants (although that’s just for convenience).
Guided examples
modularisation
Emphasize on performance! And keep up good work - Numpy is great.
Interfaces a otros lenguajes
Combine methods/algorithms under single module to reduce imports (and googling).
It occasionally breaks or installs incorrectly on windows and with VSCode. Numpy experts might do more to smooth over this issue
Cleaner documentation with more cross linking
Progress the masked array topic would be great, there were ideas on a replacement, not sure what the solution is but a more versatile and efficient solution would be awesome.
GPU computing
More cohesion? - “one right way to do things” (scalars), actually removing np.matrix, better support for @, etc.
Make the mean and standard deviation functions account for nan values like the nanmean functions.
Better integration with charts and big data. (Original in Spanish: Mejor integración con gráficos y grandes volúmenes de datos.)
Performance efficiency
GPU support
more integration with pandas, although this one is the superior one.
Julia broadcasting syntax would be AWESOME! No ideas how to make happen.
Having a wider community (both in numbers and in the diversity of its members)
Support for ragged arrays
Just better documentation
Physical units management.
Even better interoperability with other array libraries.
GPU support
Full support to type-hinting
nothing, you all are doing great with this project, thanks for your awesome work
np.nan for int arrays
Computation speed
rewrite the project in cpp instead of c and drop fortran
High-level APIs
Improved documentation on concepts and theory behind NumPy
Vizualization of NumPy documentation by using “tree” of modules and functions. Also it would be usefull if new branches of the “tree” I didn’t see yet, would be marked with some color
More functions designed for sparse arrays

Usage

Acknowledgements