numpy.random.Generator.zipf#

method

random.Generator.zipf(a, size=None)#

Draw samples from a Zipf distribution.

Samples are drawn from a Zipf distribution with specified parameter a > 1.

The Zipf distribution (also known as the zeta distribution) is a discrete probability distribution that satisfies Zipf’s law: the frequency of an item is inversely proportional to its rank in a frequency table.

Parameters:
afloat or array_like of floats

Distribution parameter. Must be greater than 1.

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, np.array(a).size samples are drawn.

Returns:
outndarray or scalar

Drawn samples from the parameterized Zipf distribution.

scipy.stats.zipf

probability density function, distribution, or cumulative density function, etc.

Notes

The probability density for the Zipf distribution is

$p(k) = \frac{k^{-a}}{\zeta(a)},$

for integers $$k \geq 1$$, where $$\zeta$$ is the Riemann Zeta function.

It is named for the American linguist George Kingsley Zipf, who noted that the frequency of any word in a sample of a language is inversely proportional to its rank in the frequency table.

References

[1]

Zipf, G. K., “Selected Studies of the Principle of Relative Frequency in Language,” Cambridge, MA: Harvard Univ. Press, 1932.

Examples

Draw samples from the distribution:

>>> a = 4.0
>>> n = 20000
>>> rng = np.random.default_rng()
>>> s = rng.zipf(a, size=n)


Display the histogram of the samples, along with the expected histogram based on the probability density function:

>>> import matplotlib.pyplot as plt
>>> from scipy.special import zeta


bincount provides a fast histogram for small integers.

>>> count = np.bincount(s)
>>> k = np.arange(1, s.max() + 1)

>>> plt.bar(k, count[1:], alpha=0.5, label='sample count')
>>> plt.plot(k, n*(k**-a)/zeta(a), 'k.-', alpha=0.5,
...          label='expected count')
>>> plt.semilogy()
>>> plt.grid(alpha=0.4)
>>> plt.legend()
>>> plt.title(f'Zipf sample, a={a}, size={n}')
>>> plt.show()