Standard array subclasses#
Note
Subclassing a numpy.ndarray
is possible but if your goal is to create
an array with modified behavior, as do dask arrays for distributed
computation and cupy arrays for GPU-based computation, subclassing is
discouraged. Instead, using numpy’s
dispatch mechanism is recommended.
The ndarray
can be inherited from (in Python or in C)
if desired. Therefore, it can form a foundation for many useful
classes. Often whether to sub-class the array object or to simply use
the core array component as an internal part of a new class is a
difficult decision, and can be simply a matter of choice. NumPy has
several tools for simplifying how your new object interacts with other
array objects, and so the choice may not be significant in the
end. One way to simplify the question is by asking yourself if the
object you are interested in can be replaced as a single array or does
it really require two or more arrays at its core.
Note that asarray
always returns the base-class ndarray. If
you are confident that your use of the array object can handle any
subclass of an ndarray, then asanyarray
can be used to allow
subclasses to propagate more cleanly through your subroutine. In
principal a subclass could redefine any aspect of the array and
therefore, under strict guidelines, asanyarray
would rarely be
useful. However, most subclasses of the array object will not
redefine certain aspects of the array object such as the buffer
interface, or the attributes of the array. One important example,
however, of why your subroutine may not be able to handle an arbitrary
subclass of an array is that matrices redefine the “*” operator to be
matrix-multiplication, rather than element-by-element multiplication.
Special attributes and methods#
See also
NumPy provides several hooks that classes can customize:
- class.__array_ufunc__(ufunc, method, *inputs, **kwargs)#
New in version 1.13.
Any class, ndarray subclass or not, can define this method or set it to None in order to override the behavior of NumPy’s ufuncs. This works quite similarly to Python’s
__mul__
and other binary operation routines.ufunc is the ufunc object that was called.
method is a string indicating which Ufunc method was called (one of
"__call__"
,"reduce"
,"reduceat"
,"accumulate"
,"outer"
,"inner"
).inputs is a tuple of the input arguments to the
ufunc
.kwargs is a dictionary containing the optional input arguments of the ufunc. If given, any
out
arguments, both positional and keyword, are passed as atuple
in kwargs. See the discussion in Universal functions (ufunc) for details.
The method should return either the result of the operation, or
NotImplemented
if the operation requested is not implemented.If one of the input, output, or
where
arguments has a__array_ufunc__
method, it is executed instead of the ufunc. If more than one of the arguments implements__array_ufunc__
, they are tried in the order: subclasses before superclasses, inputs before outputs, outputs beforewhere
, otherwise left to right. The first routine returning something other thanNotImplemented
determines the result. If all of the__array_ufunc__
operations returnNotImplemented
, aTypeError
is raised.Note
We intend to re-implement numpy functions as (generalized) Ufunc, in which case it will become possible for them to be overridden by the
__array_ufunc__
method. A prime candidate ismatmul
, which currently is not a Ufunc, but could be relatively easily be rewritten as a (set of) generalized Ufuncs. The same may happen with functions such asmedian
,amin
, andargsort
.Like with some other special methods in python, such as
__hash__
and__iter__
, it is possible to indicate that your class does not support ufuncs by setting__array_ufunc__ = None
. Ufuncs always raiseTypeError
when called on an object that sets__array_ufunc__ = None
.The presence of
__array_ufunc__
also influences howndarray
handles binary operations likearr + obj
andarr < obj
whenarr
is anndarray
andobj
is an instance of a custom class. There are two possibilities. Ifobj.__array_ufunc__
is present and not None, thenndarray.__add__
and friends will delegate to the ufunc machinery, meaning thatarr + obj
becomesnp.add(arr, obj)
, and thenadd
invokesobj.__array_ufunc__
. This is useful if you want to define an object that acts like an array.Alternatively, if
obj.__array_ufunc__
is set to None, then as a special case, special methods likendarray.__add__
will notice this and unconditionally raiseTypeError
. This is useful if you want to create objects that interact with arrays via binary operations, but are not themselves arrays. For example, a units handling system might have an objectm
representing the “meters” unit, and want to support the syntaxarr * m
to represent that the array has units of “meters”, but not want to otherwise interact with arrays via ufuncs or otherwise. This can be done by setting__array_ufunc__ = None
and defining__mul__
and__rmul__
methods. (Note that this means that writing an__array_ufunc__
that always returnsNotImplemented
is not quite the same as setting__array_ufunc__ = None
: in the former case,arr + obj
will raiseTypeError
, while in the latter case it is possible to define a__radd__
method to prevent this.)The above does not hold for in-place operators, for which
ndarray
never returnsNotImplemented
. Hence,arr += obj
would always lead to aTypeError
. This is because for arrays in-place operations cannot generically be replaced by a simple reverse operation. (For instance, by default,arr += obj
would be translated toarr = arr + obj
, i.e.,arr
would be replaced, contrary to what is expected for in-place array operations.)Note
If you define
__array_ufunc__
:If you are not a subclass of
ndarray
, we recommend your class define special methods like__add__
and__lt__
that delegate to ufuncs just like ndarray does. An easy way to do this is to subclass fromNDArrayOperatorsMixin
.If you subclass
ndarray
, we recommend that you put all your override logic in__array_ufunc__
and not also override special methods. This ensures the class hierarchy is determined in only one place rather than separately by the ufunc machinery and by the binary operation rules (which gives preference to special methods of subclasses; the alternative way to enforce a one-place only hierarchy, of setting__array_ufunc__
to None, would seem very unexpected and thus confusing, as then the subclass would not work at all with ufuncs).ndarray
defines its own__array_ufunc__
, which, evaluates the ufunc if no arguments have overrides, and returnsNotImplemented
otherwise. This may be useful for subclasses for which__array_ufunc__
converts any instances of its own class tondarray
: it can then pass these on to its superclass usingsuper().__array_ufunc__(*inputs, **kwargs)
, and finally return the results after possible back-conversion. The advantage of this practice is that it ensures that it is possible to have a hierarchy of subclasses that extend the behaviour. See Subclassing ndarray for details.
- class.__array_function__(func, types, args, kwargs)#
New in version 1.16.
func
is an arbitrary callable exposed by NumPy’s public API, which was called in the formfunc(*args, **kwargs)
.types
is a collectioncollections.abc.Collection
of unique argument types from the original NumPy function call that implement__array_function__
.The tuple
args
and dictkwargs
are directly passed on from the original call.
As a convenience for
__array_function__
implementers,types
provides all argument types with an'__array_function__'
attribute. This allows implementers to quickly identify cases where they should defer to__array_function__
implementations on other arguments. Implementations should not rely on the iteration order oftypes
.Most implementations of
__array_function__
will start with two checks:Is the given function something that we know how to overload?
Are all arguments of a type that we know how to handle?
If these conditions hold,
__array_function__
should return the result from calling its implementation forfunc(*args, **kwargs)
. Otherwise, it should return the sentinel valueNotImplemented
, indicating that the function is not implemented by these types.There are no general requirements on the return value from
__array_function__
, although most sensible implementations should probably return array(s) with the same type as one of the function’s arguments.It may also be convenient to define a custom decorators (
implements
below) for registering__array_function__
implementations.HANDLED_FUNCTIONS = {} class MyArray: def __array_function__(self, func, types, args, kwargs): if func not in HANDLED_FUNCTIONS: return NotImplemented # Note: this allows subclasses that don't override # __array_function__ to handle MyArray objects if not all(issubclass(t, MyArray) for t in types): return NotImplemented return HANDLED_FUNCTIONS[func](*args, **kwargs) def implements(numpy_function): """Register an __array_function__ implementation for MyArray objects.""" def decorator(func): HANDLED_FUNCTIONS[numpy_function] = func return func return decorator @implements(np.concatenate) def concatenate(arrays, axis=0, out=None): ... # implementation of concatenate for MyArray objects @implements(np.broadcast_to) def broadcast_to(array, shape): ... # implementation of broadcast_to for MyArray objects
Note that it is not required for
__array_function__
implementations to include all of the corresponding NumPy function’s optional arguments (e.g.,broadcast_to
above omits the irrelevantsubok
argument). Optional arguments are only passed in to__array_function__
if they were explicitly used in the NumPy function call.Just like the case for builtin special methods like
__add__
, properly written__array_function__
methods should always returnNotImplemented
when an unknown type is encountered. Otherwise, it will be impossible to correctly override NumPy functions from another object if the operation also includes one of your objects.For the most part, the rules for dispatch with
__array_function__
match those for__array_ufunc__
. In particular:NumPy will gather implementations of
__array_function__
from all specified inputs and call them in order: subclasses before superclasses, and otherwise left to right. Note that in some edge cases involving subclasses, this differs slightly from the current behavior of Python.Implementations of
__array_function__
indicate that they can handle the operation by returning any value other thanNotImplemented
.If all
__array_function__
methods returnNotImplemented
, NumPy will raiseTypeError
.
If no
__array_function__
methods exists, NumPy will default to calling its own implementation, intended for use on NumPy arrays. This case arises, for example, when all array-like arguments are Python numbers or lists. (NumPy arrays do have a__array_function__
method, given below, but it always returnsNotImplemented
if any argument other than a NumPy array subclass implements__array_function__
.)One deviation from the current behavior of
__array_ufunc__
is that NumPy will only call__array_function__
on the first argument of each unique type. This matches Python’s rule for calling reflected methods, and this ensures that checking overloads has acceptable performance even when there are a large number of overloaded arguments.
- class.__array_finalize__(obj)#
This method is called whenever the system internally allocates a new array from obj, where obj is a subclass (subtype) of the
ndarray
. It can be used to change attributes of self after construction (so as to ensure a 2-d matrix for example), or to update meta-information from the “parent.” Subclasses inherit a default implementation of this method that does nothing.
- class.__array_wrap__(array, context=None, return_scalar=False)#
At the end of every ufunc, this method is called on the input object with the highest array priority, or the output object if one was specified. The ufunc-computed array is passed in and whatever is returned is passed to the user. Subclasses inherit a default implementation of this method, which transforms the array into a new instance of the object’s class. Subclasses may opt to use this method to transform the output array into an instance of the subclass and update metadata before returning the array to the user.
NumPy may also call this function without a context from non-ufuncs to allow preserving subclass information.
Changed in version 2.0:
return_scalar
is now passed as eitherFalse
(usually) orTrue
indicating that NumPy would return a scalar. Subclasses may ignore the value, or returnarray[()]
to behave more like NumPy.Note
It is hoped to eventually deprecate this method in favour of func:__array_ufunc__ for ufuncs (and
__array_function__
for a few other functions likenumpy.squeeze
).
- class.__array_priority__#
The value of this attribute is used to determine what type of object to return in situations where there is more than one possibility for the Python type of the returned object. Subclasses inherit a default value of 0.0 for this attribute.
Note
For ufuncs, it is hoped to eventually deprecate this method in favour of
__array_ufunc__
.
- class.__array__(dtype=None, copy=None)#
If defined on an object, should return an
ndarray
. This method is called by array-coercion functions like np.array() if an object implementing this interface is passed to those functions. The third-party implementations of__array__
must takedtype
andcopy
keyword arguments, as ignoring them might break third-party code or NumPy itself.dtype
is a data type of the returned array.copy
is an optional boolean that indicates whether a copy should be returned. ForTrue
a copy should always be made, forNone
only if required (e.g. due to passeddtype
value), and forFalse
a copy should never be made (if a copy is still required, an appropriate exception should be raised).
Please refer to Interoperability with NumPy for the protocol hierarchy, of which
__array__
is the oldest and least desirable.
Matrix objects#
Note
It is strongly advised not to use the matrix subclass. As described
below, it makes writing functions that deal consistently with matrices
and regular arrays very difficult. Currently, they are mainly used for
interacting with scipy.sparse
. We hope to provide an alternative
for this use, however, and eventually remove the matrix
subclass.
matrix
objects inherit from the ndarray and therefore, they
have the same attributes and methods of ndarrays. There are six
important differences of matrix objects, however, that may lead to
unexpected results when you use matrices but expect them to act like
arrays:
Matrix objects can be created using a string notation to allow Matlab-style syntax where spaces separate columns and semicolons (‘;’) separate rows.
Matrix objects are always two-dimensional. This has far-reaching implications, in that m.ravel() is still two-dimensional (with a 1 in the first dimension) and item selection returns two-dimensional objects so that sequence behavior is fundamentally different than arrays.
Matrix objects over-ride multiplication to be matrix-multiplication. Make sure you understand this for functions that you may want to receive matrices. Especially in light of the fact that asanyarray(m) returns a matrix when m is a matrix.
Matrix objects over-ride power to be matrix raised to a power. The same warning about using power inside a function that uses asanyarray(…) to get an array object holds for this fact.
The default __array_priority__ of matrix objects is 10.0, and therefore mixed operations with ndarrays always produce matrices.
Matrices have special attributes which make calculations easier. These are
Warning
Matrix objects over-ride multiplication, ‘*’, and power, ‘**’, to be matrix-multiplication and matrix power, respectively. If your subroutine can accept sub-classes and you do not convert to base- class arrays, then you must use the ufuncs multiply and power to be sure that you are performing the correct operation for all inputs.
The matrix class is a Python subclass of the ndarray and can be used
as a reference for how to construct your own subclass of the ndarray.
Matrices can be created from other matrices, strings, and anything
else that can be converted to an ndarray
. The name “mat “is an
alias for “matrix “in NumPy.
|
Returns a matrix from an array-like object, or from a string of data. |
|
Interpret the input as a matrix. |
|
Build a matrix object from a string, nested sequence, or array. |
Example 1: Matrix creation from a string
>>> a = np.asmatrix('1 2 3; 4 5 3')
>>> print((a*a.T).I)
[[ 0.29239766 -0.13450292]
[-0.13450292 0.08187135]]
Example 2: Matrix creation from nested sequence
>>> np.asmatrix([[1,5,10],[1.0,3,4j]])
matrix([[ 1.+0.j, 5.+0.j, 10.+0.j],
[ 1.+0.j, 3.+0.j, 0.+4.j]])
Example 3: Matrix creation from an array
>>> np.asmatrix(np.random.rand(3,3)).T
matrix([[4.17022005e-01, 3.02332573e-01, 1.86260211e-01],
[7.20324493e-01, 1.46755891e-01, 3.45560727e-01],
[1.14374817e-04, 9.23385948e-02, 3.96767474e-01]])
Memory-mapped file arrays#
Memory-mapped files are useful for reading and/or modifying small segments of a large file with regular layout, without reading the entire file into memory. A simple subclass of the ndarray uses a memory-mapped file for the data buffer of the array. For small files, the over-head of reading the entire file into memory is typically not significant, however for large files using memory mapping can save considerable resources.
Memory-mapped-file arrays have one additional method (besides those
they inherit from the ndarray): .flush()
which
must be called manually by the user to ensure that any changes to the
array actually get written to disk.
|
Create a memory-map to an array stored in a binary file on disk. |
Write any changes in the array to the file on disk. |
Example:
>>> a = np.memmap('newfile.dat', dtype=float, mode='w+', shape=1000)
>>> a[10] = 10.0
>>> a[30] = 30.0
>>> del a
>>> b = np.fromfile('newfile.dat', dtype=float)
>>> print(b[10], b[30])
10.0 30.0
>>> a = np.memmap('newfile.dat', dtype=float)
>>> print(a[10], a[30])
10.0 30.0
Character arrays (numpy.char
)#
Note
The chararray
class exists for backwards compatibility with
Numarray, it is not recommended for new development. Starting from numpy
1.4, if one needs arrays of strings, it is recommended to use arrays of
dtype
object_
, bytes_
or str_
, and use the free functions
in the numpy.char
module for fast vectorized string operations.
These are enhanced arrays of either str_
type or
bytes_
type. These arrays inherit from the
ndarray
, but specially-define the operations +
, *
,
and %
on a (broadcasting) element-by-element basis. These
operations are not available on the standard ndarray
of
character type. In addition, the chararray
has all of the
standard str
(and bytes
) methods,
executing them on an element-by-element basis. Perhaps the easiest
way to create a chararray is to use self.view(chararray)
where self is an ndarray of str or unicode
data-type. However, a chararray can also be created using the
chararray
constructor, or via the
numpy.char.array
function:
|
Provides a convenient view on arrays of string and unicode values. |
|
Create a |
Another difference with the standard ndarray of str data-type is that the chararray inherits the feature introduced by Numarray that white-space at the end of any element in the array will be ignored on item retrieval and comparison operations.
Record arrays#
NumPy provides the recarray
class which allows accessing the
fields of a structured array as attributes, and a corresponding
scalar data type object record
.
|
Construct an ndarray that allows field access using attributes. |
A data-type scalar that allows field access as attribute lookup. |
Note
The pandas DataFrame is more powerful than record array. If possible, please use pandas DataFrame instead.
Masked arrays (numpy.ma
)#
See also
Standard container class#
For backward compatibility and as a standard “container “class, the
UserArray from Numeric has been brought over to NumPy and named
numpy.lib.user_array.container
The container class is a
Python class whose self.array attribute is an ndarray. Multiple
inheritance is probably easier with numpy.lib.user_array.container
than with the ndarray itself and so it is included by default. It is
not documented here beyond mentioning its existence because you are
encouraged to use the ndarray class directly if you can.
|
Standard container-class for easy multiple-inheritance. |
Array iterators#
Iterators are a powerful concept for array processing. Essentially, iterators implement a generalized for-loop. If myiter is an iterator object, then the Python code:
for val in myiter:
...
some code involving val
...
calls val = next(myiter)
repeatedly until StopIteration
is
raised by the iterator. There are several ways to iterate over an
array that may be useful: default iteration, flat iteration, and
\(N\)-dimensional enumeration.
Default iteration#
The default iterator of an ndarray object is the default Python iterator of a sequence type. Thus, when the array object itself is used as an iterator. The default behavior is equivalent to:
for i in range(arr.shape[0]):
val = arr[i]
This default iterator selects a sub-array of dimension \(N-1\) from the array. This can be a useful construct for defining recursive algorithms. To loop over the entire array requires \(N\) for-loops.
>>> a = np.arange(24).reshape(3,2,4)+10
>>> for val in a:
... print('item:', val)
item: [[10 11 12 13]
[14 15 16 17]]
item: [[18 19 20 21]
[22 23 24 25]]
item: [[26 27 28 29]
[30 31 32 33]]
Flat iteration#
A 1-D iterator over the array. |
As mentioned previously, the flat attribute of ndarray objects returns an iterator that will cycle over the entire array in C-style contiguous order.
>>> for i, val in enumerate(a.flat):
... if i%5 == 0: print(i, val)
0 10
5 15
10 20
15 25
20 30
Here, I’ve used the built-in enumerate iterator to return the iterator index as well as the value.
N-dimensional enumeration#
|
Multidimensional index iterator. |
Sometimes it may be useful to get the N-dimensional index while iterating. The ndenumerate iterator can achieve this.
>>> for i, val in np.ndenumerate(a):
... if sum(i)%5 == 0: print(i, val)
(0, 0, 0) 10
(1, 1, 3) 25
(2, 0, 3) 29
(2, 1, 2) 32
Iterator for broadcasting#
Produce an object that mimics broadcasting. |
The general concept of broadcasting is also available from Python
using the broadcast
iterator. This object takes \(N\)
objects as inputs and returns an iterator that returns tuples
providing each of the input sequence elements in the broadcasted
result.
>>> for val in np.broadcast([[1, 0], [2, 3]], [0, 1]):
... print(val)
(np.int64(1), np.int64(0))
(np.int64(0), np.int64(1))
(np.int64(2), np.int64(0))
(np.int64(3), np.int64(1))