numpy.random.Generator.multivariate_hypergeometric#
method
- random.Generator.multivariate_hypergeometric(colors, nsample, size=None, method='marginals')#
- Generate variates from a multivariate hypergeometric distribution. - The multivariate hypergeometric distribution is a generalization of the hypergeometric distribution. - Choose - nsampleitems at random without replacement from a collection with- Ndistinct types.- Nis the length of- colors, and the values in- colorsare the number of occurrences of that type in the collection. The total number of items in the collection is- sum(colors). Each random variate generated by this function is a vector of length- Nholding the counts of the different types that occurred in the- nsampleitems.- The name - colorscomes from a common description of the distribution: it is the probability distribution of the number of marbles of each color selected without replacement from an urn containing marbles of different colors;- colors[i]is the number of marbles in the urn with color- i.- Parameters:
- colorssequence of integers
- The number of each type of item in the collection from which a sample is drawn. The values in - colorsmust be nonnegative. To avoid loss of precision in the algorithm,- sum(colors)must be less than- 10**9when method is “marginals”.
- nsampleint
- The number of items selected. - nsamplemust not be greater than- sum(colors).
- sizeint or tuple of ints, optional
- The number of variates to generate, either an integer or a tuple holding the shape of the array of variates. If the given size is, e.g., - (k, m), then- k * mvariates are drawn, where one variate is a vector of length- len(colors), and the return value has shape- (k, m, len(colors)). If- sizeis an integer, the output has shape- (size, len(colors)). Default is None, in which case a single variate is returned as an array with shape- (len(colors),).
- methodstring, optional
- Specify the algorithm that is used to generate the variates. Must be ‘count’ or ‘marginals’ (the default). See the Notes for a description of the methods. 
 
- Returns:
- variatesndarray
- Array of variates drawn from the multivariate hypergeometric distribution. 
 
 - See also - hypergeometric
- Draw samples from the (univariate) hypergeometric distribution. 
 - Notes - The two methods do not return the same sequence of variates. - The “count” algorithm is roughly equivalent to the following numpy code: - choices = np.repeat(np.arange(len(colors)), colors) selection = np.random.choice(choices, nsample, replace=False) variate = np.bincount(selection, minlength=len(colors)) - The “count” algorithm uses a temporary array of integers with length - sum(colors).- The “marginals” algorithm generates a variate by using repeated calls to the univariate hypergeometric sampler. It is roughly equivalent to: - variate = np.zeros(len(colors), dtype=np.int64) # `remaining` is the cumulative sum of `colors` from the last # element to the first; e.g. if `colors` is [3, 1, 5], then # `remaining` is [9, 6, 5]. remaining = np.cumsum(colors[::-1])[::-1] for i in range(len(colors)-1): if nsample < 1: break variate[i] = hypergeometric(colors[i], remaining[i+1], nsample) nsample -= variate[i] variate[-1] = nsample - The default method is “marginals”. For some cases (e.g. when colors contains relatively small integers), the “count” method can be significantly faster than the “marginals” method. If performance of the algorithm is important, test the two methods with typical inputs to decide which works best. - New in version 1.18.0. - Examples - >>> colors = [16, 8, 4] >>> seed = 4861946401452 >>> gen = np.random.Generator(np.random.PCG64(seed)) >>> gen.multivariate_hypergeometric(colors, 6) array([5, 0, 1]) >>> gen.multivariate_hypergeometric(colors, 6, size=3) array([[5, 0, 1], [2, 2, 2], [3, 3, 0]]) >>> gen.multivariate_hypergeometric(colors, 6, size=(2, 2)) array([[[3, 2, 1], [3, 2, 1]], [[4, 1, 1], [3, 2, 1]]])