Memory management in NumPy#
The numpy.ndarray
is a python class. It requires additional memory allocations
to hold numpy.ndarray.strides
, numpy.ndarray.shape
and
numpy.ndarray.data
attributes. These attributes are specially allocated
after creating the python object in __new__
. The strides
and shape
are stored in a piece of memory allocated internally.
The data
allocation used to store the actual array values (which could be
pointers in the case of object
arrays) can be very large, so NumPy has
provided interfaces to manage its allocation and release. This document details
how those interfaces work.
Historical overview#
Since version 1.7.0, NumPy has exposed a set of PyDataMem_*
functions
(PyDataMem_NEW
, PyDataMem_FREE
, PyDataMem_RENEW
)
which are backed by alloc, free, realloc respectively.
Since those early days, Python also improved its memory management
capabilities, and began providing
various management policies beginning in version
3.4. These routines are divided into a set of domains, each domain has a
PyMemAllocatorEx
structure of routines for memory management. Python also
added a tracemalloc
module to trace calls to the various routines. These
tracking hooks were added to the NumPy PyDataMem_*
routines.
NumPy added a small cache of allocated memory in its internal
npy_alloc_cache
, npy_alloc_cache_zero
, and npy_free_cache
functions. These wrap alloc
, alloc-and-memset(0)
and free
respectively, but when npy_free_cache
is called, it adds the pointer to a
short list of available blocks marked by size. These blocks can be re-used by
subsequent calls to npy_alloc*
, avoiding memory thrashing.
Configurable memory routines in NumPy (NEP 49)#
Users may wish to override the internal data memory routines with ones of their
own. Since NumPy does not use the Python domain strategy to manage data memory,
it provides an alternative set of C-APIs to change memory routines. There are
no Python domain-wide strategies for large chunks of object data, so those are
less suited to NumPy’s needs. User who wish to change the NumPy data memory
management routines can use PyDataMem_SetHandler
, which uses a
PyDataMem_Handler
structure to hold pointers to functions used to
manage the data memory. The calls are still wrapped by internal routines to
call PyTraceMalloc_Track
, PyTraceMalloc_Untrack
. Since the
functions may change during the lifetime of the process, each ndarray
carries with it the functions used at the time of its instantiation, and these
will be used to reallocate or free the data memory of the instance.
-
type PyDataMem_Handler#
A struct to hold function pointers used to manipulate memory
typedef struct { char name[127]; /* multiple of 64 to keep the struct aligned */ uint8_t version; /* currently 1 */ PyDataMemAllocator allocator; } PyDataMem_Handler;
where the allocator structure is
/* The declaration of free differs from PyMemAllocatorEx */ typedef struct { void *ctx; void* (*malloc) (void *ctx, size_t size); void* (*calloc) (void *ctx, size_t nelem, size_t elsize); void* (*realloc) (void *ctx, void *ptr, size_t new_size); void (*free) (void *ctx, void *ptr, size_t size); } PyDataMemAllocator;
-
PyObject *PyDataMem_SetHandler(PyObject *handler)#
Set a new allocation policy. If the input value is
NULL
, will reset the policy to the default. Return the previous policy, or returnNULL
if an error has occurred. We wrap the user-provided functions so they will still call the python and numpy memory management callback hooks.
-
PyObject *PyDataMem_GetHandler()#
Return the current policy that will be used to allocate data for the next
PyArrayObject
. On failure, returnNULL
.
For an example of setting up and using the PyDataMem_Handler, see the test in
numpy/_core/tests/test_mem_policy.py
What happens when deallocating if there is no policy set#
A rare but useful technique is to allocate a buffer outside NumPy, use
PyArray_NewFromDescr
to wrap the buffer in a ndarray
, then switch
the OWNDATA
flag to true. When the ndarray
is released, the
appropriate function from the ndarray
’s PyDataMem_Handler
should be
called to free the buffer. But the PyDataMem_Handler
field was never set,
it will be NULL
. For backward compatibility, NumPy will call free()
to
release the buffer. If NUMPY_WARN_IF_NO_MEM_POLICY
is set to 1
, a
warning will be emitted. The current default is not to emit a warning, this may
change in a future version of NumPy.
A better technique would be to use a PyCapsule
as a base object:
/* define a PyCapsule_Destructor, using the correct deallocator for buff */
void free_wrap(void *capsule){
void * obj = PyCapsule_GetPointer(capsule, PyCapsule_GetName(capsule));
free(obj);
};
/* then inside the function that creates arr from buff */
...
arr = PyArray_NewFromDescr(... buf, ...);
if (arr == NULL) {
return NULL;
}
capsule = PyCapsule_New(buf, "my_wrapped_buffer",
(PyCapsule_Destructor)&free_wrap);
if (PyArray_SetBaseObject(arr, capsule) == -1) {
Py_DECREF(arr);
return NULL;
}
...
Example of memory tracing with np.lib.tracemalloc_domain
#
Note that since Python 3.6 (or newer), the builtin tracemalloc
module can be used to
track allocations inside NumPy. NumPy places its CPU memory allocations into the
np.lib.tracemalloc_domain
domain.
For additional information, check: https://docs.python.org/3/library/tracemalloc.html.
Here is an example on how to use np.lib.tracemalloc_domain
:
"""
The goal of this example is to show how to trace memory
from an application that has NumPy and non-NumPy sections.
We only select the sections using NumPy related calls.
"""
import tracemalloc
import numpy as np
# Flag to determine if we select NumPy domain
use_np_domain = True
nx = 300
ny = 500
# Start to trace memory
tracemalloc.start()
# Section 1
# ---------
# NumPy related call
a = np.zeros((nx,ny))
# non-NumPy related call
b = [i**2 for i in range(nx*ny)]
snapshot1 = tracemalloc.take_snapshot()
# We filter the snapshot to only select NumPy related calls
np_domain = np.lib.tracemalloc_domain
dom_filter = tracemalloc.DomainFilter(inclusive=use_np_domain,
domain=np_domain)
snapshot1 = snapshot1.filter_traces([dom_filter])
top_stats1 = snapshot1.statistics('traceback')
print("================ SNAPSHOT 1 =================")
for stat in top_stats1:
print(f"{stat.count} memory blocks: {stat.size / 1024:.1f} KiB")
print(stat.traceback.format()[-1])
# Clear traces of memory blocks allocated by Python
# before moving to the next section.
tracemalloc.clear_traces()
# Section 2
#----------
# We are only using NumPy
c = np.sum(a*a)
snapshot2 = tracemalloc.take_snapshot()
top_stats2 = snapshot2.statistics('traceback')
print()
print("================ SNAPSHOT 2 =================")
for stat in top_stats2:
print(f"{stat.count} memory blocks: {stat.size / 1024:.1f} KiB")
print(stat.traceback.format()[-1])
tracemalloc.stop()
print()
print("============================================")
print("\nTracing Status : ", tracemalloc.is_tracing())
try:
print("\nTrying to Take Snapshot After Tracing is Stopped.")
snap = tracemalloc.take_snapshot()
except Exception as e:
print("Exception : ", e)