NEP 34 — Disallow inferring
dtype=object from sequences#
When users create arrays with sequences-of-sequences, they sometimes err in
matching the lengths of the nested sequences, commonly called “ragged
arrays”. Here we will refer to them as ragged nested sequences. Creating such
np.array([<ragged_nested_sequence>]) with no
argument will today default to an
object-dtype array. Change the behaviour to
Motivation and Scope#
Users who specify lists-of-lists when creating a numpy.ndarray via
np.array may mistakenly pass in lists of different lengths. Currently we
accept this input and automatically create an array with
can be confusing, since it is rarely what is desired. Changing the automatic
dtype detection to never return
object for ragged nested sequences (defined as a
recursive sequence of sequences, where not all the sequences on the same
level have the same length) will force users who actually wish to create
object arrays to specify that explicitly. Note that
nd.ndarrays are all sequences . See for instance issue 5303.
Usage and Impact#
After this change, array creation with ragged nested sequences must explicitly define a dtype:
>>> np.array([[1, 2], ]) ValueError: cannot guess the desired dtype from the input
>>> np.array([[1, 2], ], dtype=object) # succeeds, with no change from current behaviour
The deprecation will affect any call that internally calls
assert_equal family of functions calls
users will have to change code like:
np.assert_equal(a, [[1, 2], 3])
np.assert_equal(a, np.array([[1, 2], 3], dtype=object))
To explicitly set the shape of the object array, since it is sometimes hard to determine what shape is desired, one could use:
>>> arr = np.empty(correct_shape, dtype=object) >>> arr[...] = values
We will also reject mixed sequences of non-sequence and sequence, for instance all of these will be rejected:
>>> arr = np.array([np.arange(10), ]) >>> arr = np.array([[range(3), range(3), range(3)], [range(3), 0, 0]])
The code to be changed is inside
PyArray_GetArrayParamsFromObject and the
discover_dimensions function. The first implementation in PR
14794 caused a number of downstream library failures and was reverted before
the release of 1.18. Subsequently downstream libraries fixed the places they
were using ragged arrays. The reimplementation became PR 15119 which was
merged for the 1.19 release.
Anyone depending on creating object arrays from ragged nested sequences will
need to modify their code. There will be a deprecation period during which the
current behaviour will emit a
We could continue with the current situation.
It was also suggested to add a kwarg
depthto array creation, or perhaps to add another array creation API function
ragged_array_object. The goal was to eliminate the ambiguity in creating an object array from
array([[1, 2], ], dtype=object): should the returned array have a shape of
(2,)? This NEP does not deal with that issue, and only deprecates the use of
dtype=objectfor ragged nested sequences. Users of ragged nested sequences may face another deprecation cycle in the future. Rationale: we expect that there are very few users who intend to use ragged arrays like that, this was never intended as a use case of NumPy arrays. Users are likely better off with another library or just using list of lists.
It was also suggested to deprecate all automatic creation of
object-dtype arrays, which would require adding an explicit
dtype=objectfor something like
np.array([Decimal(10), Decimal(10)]). This too is out of scope for the current NEP. Rationale: it’s harder to asses the impact of this larger change, we’re not sure how many users this may impact.
Comments to issue 5303 indicate this is unintended behaviour as far back as 2014. Suggestions to change it have been made in the ensuing years, but none have stuck. The WIP implementation in PR 14794 seems to point to the viability of this approach.
References and Footnotes#
This document has been placed in the public domain.