Performance¶
Recommendation¶
The recommended generator for general use is PCG64
or its upgraded variant
PCG64DXSM
for heavily-parallel use cases. They are statistically high quality,
full-featured, and fast on most platforms, but somewhat slow when compiled for
32-bit processes. See Upgrading PCG64 with PCG64DXSM for details on when heavy
parallelism would indicate using PCG64DXSM
.
Philox
is fairly slow, but its statistical properties have
very high quality, and it is easy to get assuredly-independent stream by using
unique keys. If that is the style you wish to use for parallel streams, or you
are porting from another system that uses that style, then
Philox
is your choice.
SFC64
is statistically high quality and very fast. However, it
lacks jumpability. If you are not using that capability and want lots of speed,
even on 32-bit processes, this is your choice.
MT19937
fails some statistical tests and is not especially
fast compared to modern PRNGs. For these reasons, we mostly do not recommend
using it on its own, only through the legacy RandomState
for
reproducing old results. That said, it has a very long history as a default in
many systems.
Timings¶
The timings below are the time in ns to produce 1 random value from a
specific distribution. The original MT19937
generator is
much slower since it requires 2 32-bit values to equal the output of the
faster generators.
Integer performance has a similar ordering.
The pattern is similar for other, more complex generators. The normal
performance of the legacy RandomState
generator is much
lower than the other since it uses the Box-Muller transform rather
than the Ziggurat method. The performance gap for Exponentials is also
large due to the cost of computing the log function to invert the CDF.
The column labeled MT19973 uses the same 32-bit generator as
RandomState
but produces random variates using Generator
.
MT19937 |
PCG64 |
PCG64DXSM |
Philox |
SFC64 |
RandomState |
|
---|---|---|---|---|---|---|
32-bit Unsigned Ints |
3.3 |
1.9 |
2.0 |
3.3 |
1.8 |
3.1 |
64-bit Unsigned Ints |
5.6 |
3.2 |
2.9 |
4.9 |
2.5 |
5.5 |
Uniforms |
5.9 |
3.1 |
2.9 |
5.0 |
2.6 |
6.0 |
Normals |
13.9 |
10.8 |
10.5 |
12.0 |
8.3 |
56.8 |
Exponentials |
9.1 |
6.0 |
5.8 |
8.1 |
5.4 |
63.9 |
Gammas |
37.2 |
30.8 |
28.9 |
34.0 |
27.5 |
77.0 |
Binomials |
21.3 |
17.4 |
17.6 |
19.3 |
15.6 |
21.4 |
Laplaces |
73.2 |
72.3 |
76.1 |
73.0 |
72.3 |
82.5 |
Poissons |
111.7 |
103.4 |
100.5 |
109.4 |
90.7 |
115.2 |
The next table presents the performance in percentage relative to values
generated by the legacy generator, RandomState(MT19937())
. The overall
performance was computed using a geometric mean.
MT19937 |
PCG64 |
PCG64DXSM |
Philox |
SFC64 |
|
---|---|---|---|---|---|
32-bit Unsigned Ints |
96 |
162 |
160 |
96 |
175 |
64-bit Unsigned Ints |
97 |
171 |
188 |
113 |
218 |
Uniforms |
102 |
192 |
206 |
121 |
233 |
Normals |
409 |
526 |
541 |
471 |
684 |
Exponentials |
701 |
1071 |
1101 |
784 |
1179 |
Gammas |
207 |
250 |
266 |
227 |
281 |
Binomials |
100 |
123 |
122 |
111 |
138 |
Laplaces |
113 |
114 |
108 |
113 |
114 |
Poissons |
103 |
111 |
115 |
105 |
127 |
Overall |
159 |
219 |
225 |
174 |
251 |
Note
All timings were taken using Linux on an AMD Ryzen 9 3900X processor.
Performance on different Operating Systems¶
Performance differs across platforms due to compiler and hardware availability (e.g., register width) differences. The default bit generator has been chosen to perform well on 64-bit platforms. Performance on 32-bit operating systems is very different.
The values reported are normalized relative to the speed of MT19937 in each table. A value of 100 indicates that the performance matches the MT19937. Higher values indicate improved performance. These values cannot be compared across tables.
64-bit Linux¶
Distribution |
MT19937 |
PCG64 |
PCG64DXSM |
Philox |
SFC64 |
---|---|---|---|---|---|
32-bit Unsigned Ints |
100 |
168 |
166 |
100 |
182 |
64-bit Unsigned Ints |
100 |
176 |
193 |
116 |
224 |
Uniforms |
100 |
188 |
202 |
118 |
228 |
Normals |
100 |
128 |
132 |
115 |
167 |
Exponentials |
100 |
152 |
157 |
111 |
168 |
Overall |
100 |
161 |
168 |
112 |
192 |
64-bit Windows¶
The relative performance on 64-bit Linux and 64-bit Windows is broadly similar with the notable exception of the Philox generator.
Distribution |
MT19937 |
PCG64 |
PCG64DXSM |
Philox |
SFC64 |
---|---|---|---|---|---|
32-bit Unsigned Ints |
100 |
155 |
131 |
29 |
150 |
64-bit Unsigned Ints |
100 |
157 |
143 |
25 |
154 |
Uniforms |
100 |
151 |
144 |
24 |
155 |
Normals |
100 |
129 |
128 |
37 |
150 |
Exponentials |
100 |
150 |
145 |
28 |
159 |
Overall |
100 |
148 |
138 |
28 |
154 |
32-bit Windows¶
The performance of 64-bit generators on 32-bit Windows is much lower than on 64-bit operating systems due to register width. MT19937, the generator that has been in NumPy since 2005, operates on 32-bit integers.
Distribution |
MT19937 |
PCG64 |
PCG64DXSM |
Philox |
SFC64 |
---|---|---|---|---|---|
32-bit Unsigned Ints |
100 |
24 |
34 |
14 |
57 |
64-bit Unsigned Ints |
100 |
21 |
32 |
14 |
74 |
Uniforms |
100 |
21 |
34 |
16 |
73 |
Normals |
100 |
36 |
57 |
28 |
101 |
Exponentials |
100 |
28 |
44 |
20 |
88 |
Overall |
100 |
25 |
39 |
18 |
77 |
Note
Linux timings used Ubuntu 20.04 and GCC 9.3.0. Windows timings were made on Windows 10 using Microsoft C/C++ Optimizing Compiler Version 19 (Visual Studio 2019). All timings were produced on an AMD Ryzen 9 3900X processor.