Number of Threads used for Linear Algebra
NumPy itself is normally intentionally limited to a single thread
during function calls, however it does support multiple Python
threads running at the same time.
Note that for performant linear algebra NumPy uses a BLAS backend
such as OpenBLAS or MKL, which may use multiple threads that may
be controlled by environment variables such as OMP_NUM_THREADS
depending on what is used.
One way to control the number of threads is the package
threadpoolctl
Madvise Hugepage on Linux
When working with very large arrays on modern Linux kernels,
you can experience a significant speedup when
transparent hugepage
is used.
The current system policy for transparent hugepages can be seen by:
cat /sys/kernel/mm/transparent_hugepage/enabled
When set to madvise
NumPy will typically use hugepages for a performance
boost. This behaviour can be modified by setting the environment variable:
or setting it to 1
to always enable it. When not set, the default
is to use madvise on Kernels 4.6 and newer. These kernels presumably
experience a large speedup with hugepage support.
This flag is checked at import time.