NumPy

Table of Contents

Previous topic

NumPy 1.17.1 Release Notes

Next topic

NumPy 1.16.5 Release Notes

NumPy 1.17.0 Release Notes

This NumPy release contains a number of new features that should substantially improve its performance and usefulness, see Highlights below for a summary. The Python versions supported are 3.5-3.7, note that Python 2.7 has been dropped. Python 3.8b2 should work with the released source packages, but there are no future guarantees.

Downstream developers should use Cython >= 0.29.11 for Python 3.8 support and OpenBLAS >= 3.7 (not currently out) to avoid problems on the Skylake architecture. The NumPy wheels on PyPI are built from the OpenBLAS development branch in order to avoid those problems.

Highlights

  • A new extensible random module along with four selectable random number generators and improved seeding designed for use in parallel processes has been added. The currently available bit generators are MT19937, PCG64, Philox, and SFC64. See below under New Features.

  • NumPy’s FFT implementation was changed from fftpack to pocketfft, resulting in faster, more accurate transforms and better handling of datasets of prime length. See below under Improvements.

  • New radix sort and timsort sorting methods. It is currently not possible to choose which will be used. They are hardwired to the datatype and used when either stable or mergesort is passed as the method. See below under Improvements.

  • Overriding numpy functions is now possible by default, see __array_function__ below.

New functions

Deprecations

numpy.polynomial functions warn when passed float in place of int

Previously functions in this module would accept float values provided they were integral (1.0, 2.0, etc). For consistency with the rest of numpy, doing so is now deprecated, and in future will raise a TypeError.

Similarly, passing a float like 0.5 in place of an integer will now raise a TypeError instead of the previous ValueError.

Deprecate numpy.distutils.exec_command and temp_file_name

The internal use of these functions has been refactored and there are better alternatives. Replace exec_command with subprocess.Popen and temp_file_name with tempfile.mkstemp.

Writeable flag of C-API wrapped arrays

When an array is created from the C-API to wrap a pointer to data, the only indication we have of the read-write nature of the data is the writeable flag set during creation. It is dangerous to force the flag to writeable. In the future it will not be possible to switch the writeable flag to True from python. This deprecation should not affect many users since arrays created in such a manner are very rare in practice and only available through the NumPy C-API.

numpy.nonzero should no longer be called on 0d arrays

The behavior of numpy.nonzero on 0d arrays was surprising, making uses of it almost always incorrect. If the old behavior was intended, it can be preserved without a warning by using nonzero(atleast_1d(arr)) instead of nonzero(arr). In a future release, it is most likely this will raise a ValueError.

Writing to the result of numpy.broadcast_arrays will warn

Commonly numpy.broadcast_arrays returns a writeable array with internal overlap, making it unsafe to write to. A future version will set the writeable flag to False, and require users to manually set it to True if they are sure that is what they want to do. Now writing to it will emit a deprecation warning with instructions to set the writeable flag True. Note that if one were to inspect the flag before setting it, one would find it would already be True. Explicitly setting it, though, as one will need to do in future versions, clears an internal flag that is used to produce the deprecation warning. To help alleviate confusion, an additional FutureWarning will be emitted when accessing the writeable flag state to clarify the contradiction.

Note that for the C-side buffer protocol such an array will return a readonly buffer immediately unless a writable buffer is requested. If a writeable buffer is requested a warning will be given. When using cython, the const qualifier should be used with such arrays to avoid the warning (e.g. cdef const double[::1] view).

Future Changes

Shape-1 fields in dtypes won’t be collapsed to scalars in a future version

Currently, a field specified as [(name, dtype, 1)] or "1type" is interpreted as a scalar field (i.e., the same as [(name, dtype)] or [(name, dtype, ()]). This now raises a FutureWarning; in a future version, it will be interpreted as a shape-(1,) field, i.e. the same as [(name, dtype, (1,))] or "(1,)type" (consistently with [(name, dtype, n)] / "ntype" with n>1, which is already equivalent to [(name, dtype, (n,)] / "(n,)type").

Compatibility notes

float16 subnormal rounding

Casting from a different floating point precision to float16 used incorrect rounding in some edge cases. This means in rare cases, subnormal results will now be rounded up instead of down, changing the last bit (ULP) of the result.

Signed zero when using divmod

Starting in version 1.12.0, numpy incorrectly returned a negatively signed zero when using the divmod and floor_divide functions when the result was zero. For example:

>>> np.zeros(10)//1
array([-0., -0., -0., -0., -0., -0., -0., -0., -0., -0.])

With this release, the result is correctly returned as a positively signed zero:

>>> np.zeros(10)//1
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

MaskedArray.mask now returns a view of the mask, not the mask itself

Returning the mask itself was unsafe, as it could be reshaped in place which would violate expectations of the masked array code. The behavior of mask is now consistent with data, which also returns a view.

The underlying mask can still be accessed with ._mask if it is needed. Tests that contain assert x.mask is not y.mask or similar will need to be updated.

Do not lookup __buffer__ attribute in numpy.frombuffer

Looking up __buffer__ attribute in numpy.frombuffer was undocumented and non-functional. This code was removed. If needed, use frombuffer(memoryview(obj), ...) instead.

out is buffered for memory overlaps in take, choose, put

If the out argument to these functions is provided and has memory overlap with the other arguments, it is now buffered to avoid order-dependent behavior.

Unpickling while loading requires explicit opt-in

The functions load, and lib.format.read_array take an allow_pickle keyword which now defaults to False in response to CVE-2019-6446.

Potential changes to the random stream in old random module

Due to bugs in the application of log to random floating point numbers, the stream may change when sampling from beta, binomial, laplace, logistic, logseries or multinomial if a 0 is generated in the underlying MT19937 random stream. There is a 1 in 10^{53} chance of this occurring, so the probability that the stream changes for any given seed is extremely small. If a 0 is encountered in the underlying generator, then the incorrect value produced (either numpy.inf or numpy.nan) is now dropped.

i0 now always returns a result with the same shape as the input

Previously, the output was squeezed, such that, e.g., input with just a single element would lead to an array scalar being returned, and inputs with shapes such as (10, 1) would yield results that would not broadcast against the input.

Note that we generally recommend the SciPy implementation over the numpy one: it is a proper ufunc written in C, and more than an order of magnitude faster.

can_cast no longer assumes all unsafe casting is allowed

Previously, can_cast returned True for almost all inputs for casting='unsafe', even for cases where casting was not possible, such as from a structured dtype to a regular one. This has been fixed, making it more consistent with actual casting using, e.g., the .astype method.

ndarray.flags.writeable can be switched to true slightly more often

In rare cases, it was not possible to switch an array from not writeable to writeable, although a base array is writeable. This can happen if an intermediate ndarray.base object is writeable. Previously, only the deepest base object was considered for this decision. However, in rare cases this object does not have the necessary information. In that case switching to writeable was never allowed. This has now been fixed.

C API changes

dimension or stride input arguments are now passed by npy_intp const*

Previously these function arguments were declared as the more strict npy_intp*, which prevented the caller passing constant data. This change is backwards compatible, but now allows code like:

npy_intp const fixed_dims[] = {1, 2, 3};
// no longer complains that the const-qualifier is discarded
npy_intp size = PyArray_MultiplyList(fixed_dims, 3);

New Features

New extensible numpy.random module with selectable random number generators

A new extensible numpy.random module along with four selectable random number generators and improved seeding designed for use in parallel processes has been added. The currently available Bit Generators are MT19937, PCG64, Philox, and SFC64. PCG64 is the new default while MT19937 is retained for backwards compatibility. Note that the legacy random module is unchanged and is now frozen, your current results will not change. More information is available in the API change description and in the top-level view documentation.

libFLAME

Support for building NumPy with the libFLAME linear algebra package as the LAPACK, implementation, see libFLAME for details.

User-defined BLAS detection order

distutils now uses an environment variable, comma-separated and case insensitive, to determine the detection order for BLAS libraries. By default NPY_BLAS_ORDER=mkl,blis,openblas,atlas,accelerate,blas. However, to force the use of OpenBLAS simply do:

NPY_BLAS_ORDER=openblas python setup.py build

which forces the use of OpenBLAS. This may be helpful for users which have a MKL installation but wishes to try out different implementations.

User-defined LAPACK detection order

numpy.distutils now uses an environment variable, comma-separated and case insensitive, to determine the detection order for LAPACK libraries. By default NPY_LAPACK_ORDER=mkl,openblas,flame,atlas,accelerate,lapack. However, to force the use of OpenBLAS simply do:

NPY_LAPACK_ORDER=openblas python setup.py build

which forces the use of OpenBLAS. This may be helpful for users which have a MKL installation but wishes to try out different implementations.

Timsort and radix sort have replaced mergesort for stable sorting

Both radix sort and timsort have been implemented and are now used in place of mergesort. Due to the need to maintain backward compatibility, the sorting kind options "stable" and "mergesort" have been made aliases of each other with the actual sort implementation depending on the array type. Radix sort is used for small integer types of 16 bits or less and timsort for the remaining types. Timsort features improved performace on data containing already or nearly sorted data and performs like mergesort on random data and requires O(n/2) working space. Details of the timsort algorithm can be found at CPython listsort.txt.

packbits and unpackbits accept an order keyword

The order keyword defaults to big, and will order the bits accordingly. For 'order=big' 3 will become [0, 0, 0, 0, 0, 0, 1, 1], and [1, 1, 0, 0, 0, 0, 0, 0] for order=little

unpackbits now accepts a count parameter

count allows subsetting the number of bits that will be unpacked up-front, rather than reshaping and subsetting later, making the packbits operation invertible, and the unpacking less wasteful. Counts larger than the number of available bits add zero padding. Negative counts trim bits off the end instead of counting from the beginning. None counts implement the existing behavior of unpacking everything.

linalg.svd and linalg.pinv can be faster on hermitian inputs

These functions now accept a hermitian argument, matching the one added to linalg.matrix_rank in 1.14.0.

divmod operation is now supported for two timedelta64 operands

The divmod operator now handles two timedelta64 operands, with type signature mm->qm.

fromfile now takes an offset argument

This function now takes an offset keyword argument for binary files, which specifics the offset (in bytes) from the file’s current position. Defaults to 0.

New mode “empty” for pad

This mode pads an array to a desired shape without initializing the new entries.

Floating point scalars implement as_integer_ratio to match the builtin float

This returns a (numerator, denominator) pair, which can be used to construct a fractions.Fraction.

Structured dtype objects can be indexed with multiple fields names

arr.dtype[['a', 'b']] now returns a dtype that is equivalent to arr[['a', 'b']].dtype, for consistency with arr.dtype['a'] == arr['a'].dtype.

Like the dtype of structured arrays indexed with a list of fields, this dtype has the same itemsize as the original, but only keeps a subset of the fields.

This means that arr[['a', 'b']] and arr.view(arr.dtype[['a', 'b']]) are equivalent.

.npy files support unicode field names

A new format version of 3.0 has been introduced, which enables structured types with non-latin1 field names. This is used automatically when needed.

Improvements

Array comparison assertions include maximum differences

Error messages from array comparison tests such as testing.assert_allclose now include “max absolute difference” and “max relative difference,” in addition to the previous “mismatch” percentage. This information makes it easier to update absolute and relative error tolerances.

Replacement of the fftpack based fft module by the pocketfft library

Both implementations have the same ancestor (Fortran77 FFTPACK by Paul N. Swarztrauber), but pocketfft contains additional modifications which improve both accuracy and performance in some circumstances. For FFT lengths containing large prime factors, pocketfft uses Bluestein’s algorithm, which maintains O(N log N) run time complexity instead of deteriorating towards O(N*N) for prime lengths. Also, accuracy for real valued FFTs with near prime lengths has improved and is on par with complex valued FFTs.

Further improvements to ctypes support in numpy.ctypeslib

A new numpy.ctypeslib.as_ctypes_type function has been added, which can be used to converts a dtype into a best-guess ctypes type. Thanks to this new function, numpy.ctypeslib.as_ctypes now supports a much wider range of array types, including structures, booleans, and integers of non-native endianness.

numpy.errstate is now also a function decorator

Currently, if you have a function like:

def foo():
    pass

and you want to wrap the whole thing in errstate, you have to rewrite it like so:

def foo():
    with np.errstate(...):
        pass

but with this change, you can do:

@np.errstate(...)
def foo():
    pass

thereby saving a level of indentation

numpy.exp and numpy.log speed up for float32 implementation

float32 implementation of exp and log now benefit from AVX2/AVX512 instruction set which are detected during runtime. exp has a max ulp error of 2.52 and log has a max ulp error or 3.83.

Improve performance of numpy.pad

The performance of the function has been improved for most cases by filling in a preallocated array with the desired padded shape instead of using concatenation.

numpy.interp handles infinities more robustly

In some cases where interp would previously return nan, it now returns an appropriate infinity.

Pathlib support for fromfile, tofile and ndarray.dump

fromfile, ndarray.ndarray.tofile and ndarray.dump now support the pathlib.Path type for the file/fid parameter.

Specialized isnan, isinf, and isfinite ufuncs for bool and int types

The boolean and integer types are incapable of storing nan and inf values, which allows us to provide specialized ufuncs that are up to 250x faster than the previous approach.

isfinite supports datetime64 and timedelta64 types

Previously, isfinite used to raise a TypeError on being used on these two types.

New keywords added to nan_to_num

nan_to_num now accepts keywords nan, posinf and neginf allowing the user to define the value to replace the nan, positive and negative np.inf values respectively.

MemoryErrors caused by allocated overly large arrays are more descriptive

Often the cause of a MemoryError is incorrect broadcasting, which results in a very large and incorrect shape. The message of the error now includes this shape to help diagnose the cause of failure.

floor, ceil, and trunc now respect builtin magic methods

These ufuncs now call the __floor__, __ceil__, and __trunc__ methods when called on object arrays, making them compatible with decimal.Decimal and fractions.Fraction objects.

quantile now works on fraction.Fraction and decimal.Decimal objects

In general, this handles object arrays more gracefully, and avoids floating- point operations if exact arithmetic types are used.

Support of object arrays in matmul

It is now possible to use matmul (or the @ operator) with object arrays. For instance, it is now possible to do:

from fractions import Fraction
a = np.array([[Fraction(1, 2), Fraction(1, 3)], [Fraction(1, 3), Fraction(1, 2)]])
b = a @ a

Changes

median and percentile family of functions no longer warn about nan

numpy.median, numpy.percentile, and numpy.quantile used to emit a RuntimeWarning when encountering an nan. Since they return the nan value, the warning is redundant and has been removed.

timedelta64 % 0 behavior adjusted to return NaT

The modulus operation with two np.timedelta64 operands now returns NaT in the case of division by zero, rather than returning zero

NumPy functions now always support overrides with __array_function__

NumPy now always checks the __array_function__ method to implement overrides of NumPy functions on non-NumPy arrays, as described in NEP 18. The feature was available for testing with NumPy 1.16 if appropriate environment variables are set, but is now always enabled.

lib.recfunctions.structured_to_unstructured does not squeeze single-field views

Previously structured_to_unstructured(arr[['a']]) would produce a squeezed result inconsistent with structured_to_unstructured(arr[['a', b']]). This was accidental. The old behavior can be retained with structured_to_unstructured(arr[['a']]).squeeze(axis=-1) or far more simply, arr['a'].

clip now uses a ufunc under the hood

This means that registering clip functions for custom dtypes in C via descr->f->fastclip is deprecated - they should use the ufunc registration mechanism instead, attaching to the np.core.umath.clip ufunc.

It also means that clip accepts where and casting arguments, and can be override with __array_ufunc__.

A consequence of this change is that some behaviors of the old clip have been deprecated:

  • Passing nan to mean “do not clip” as one or both bounds. This didn’t work in all cases anyway, and can be better handled by passing infinities of the appropriate sign.

  • Using “unsafe” casting by default when an out argument is passed. Using casting="unsafe" explicitly will silence this warning.

Additionally, there are some corner cases with behavior changes:

  • Padding max < min has changed to be more consistent across dtypes, but should not be relied upon.

  • Scalar min and max take part in promotion rules like they do in all other ufuncs.

__array_interface__ offset now works as documented

The interface may use an offset value that was mistakenly ignored.

Pickle protocol in savez set to 3 for force zip64 flag

savez was not using the force_zip64 flag, which limited the size of the archive to 2GB. But using the flag requires us to use pickle protocol 3 to write object arrays. The protocol used was bumped to 3, meaning the archive will be unreadable by Python2.

Structured arrays indexed with non-existent fields raise KeyError not ValueError

arr['bad_field'] on a structured type raises KeyError, for consistency with dict['bad_field'].