Prob.itsample#

Inverse transform sampling, for sampling from arbitrary probability density functions.

adaptive_chebfit(pdf, lower_bd, upper_bd, eps=1e-15)#

Fit a chebyshev polynomial, increasing sampling rate until coefficient tail falls below provided tolerance.

Parameters:
  • pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd (float) – Upper bound of the support of the pdf.

  • eps (float) – Error tolerance of Chebyshev polynomial fit of PDF.

Returns:

  • x (array) – The nodes at which the polynomial interpolation takes place. These are adaptively chosen based on the provided tolerance.

  • coeffs (array) – Coefficients in Chebyshev approximation of the PDF.

Notes

This fit defines the “error” as the magnitude of the tail of the Chebyshev coefficients. Computing the true error (i.e. discrepancy between the PDF and it’s approximant) would be much slower, so we avoid it and use this rough approximation in its place.

chebcdf(pdf, lower_bd, upper_bd, eps=1e-15)#

Get Chebyshev approximation of the CDF.

Parameters:
  • pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd (float) – Upper bound of the support of the pdf.

  • eps (float) – Error tolerance of Chebyshev polynomial fit of PDF.

Returns:

cdf – The cumulative density function of the (normalized version of the) provided pdf. The function cdf() takes an iterable of floats or doubles as an argument, and returns an iterable of floats of the same length.

Return type:

function

get_cdf(pdf, lower_bd=-inf, upper_bd=inf)#

Generate a CDF from a (possibly not normalized) pdf.

Parameters:
  • pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd (float) – Upper bound of the support of the pdf.

Returns:

cdf – The cumulative density function of the (normalized version of the) provided pdf. Will return a float if provided with a float or int; will return a numpy array if provided with an iterable.

Return type:

function

normalize(pdf, lower_bd=-inf, upper_bd=inf, vectorize=False)#

Normalize a non-normalized PDF.

Parameters:
  • pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd (float) – Upper bound of the support of the pdf.

  • vectorize (boolean) – Vectorize the function. This slows down function calls, and so is generally set to False.

Returns:

pdf_norm – Function with same signature as pdf, but normalized so that the integral between lower_bd and upper_bd is close to 1. Maps nicely over iterables.

Return type:

function

sample(pdf, num_samples, lower_bd=-inf, upper_bd=inf, guess=0, chebyshev=False)#

Sample from an arbitrary, unnormalized PDF.

Parameters:
  • pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.

  • num_samples (int) – The number of samples to be generated.

  • lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.

  • upper_bd (float) – Upper bound of the support of the pdf.

  • guess (float or int) – Initial guess for the numerical solver to use when inverting the CDF.

  • chebyshev (Boolean, optional (default=False)) – If True, then the CDF is approximated using Chebyshev polynomials.

Returns:

samples – An array of samples from the provided PDF, with support between lower_bd and upper_bd.

Return type:

numpy array

Notes

For a unimodal distribution, the mode is a good choice for the parameter guess. Any number for which the CDF is not extremely close to 0 or 1 should be acceptable. If the cdf(guess) is near 1 or 0, then its derivative is near 0, and so the numerical root finder will be very slow to converge.

This sampling technique is slow (~3 ms/sample for a unit normal with initial guess of 0), since we re-integrate to get the CDF at every iteration of the numerical root-finder. This is improved somewhat by using Chebyshev approximations of the CDF, but the sampling rate is still prohibitively slow for >100k samples.

uniform(low=0.0, high=1.0, size=None)#

Draw samples from a uniform distribution.

Samples are uniformly distributed over the half-open interval [low, high) (includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.

Note

New code should use the ~numpy.random.Generator.uniform method of a ~numpy.random.Generator instance instead; please see the random-quick-start.

Parameters:
  • low (float or array_like of floats, optional) – Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.

  • high (float or array_like of floats) – Upper boundary of the output interval. All values generated will be less than or equal to high. The high limit may be included in the returned array of floats due to floating-point rounding in the equation low + (high-low) * random_sample(). The default value is 1.0.

  • size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if low and high are both scalars. Otherwise, np.broadcast(low, high).size samples are drawn.

Returns:

out – Drawn samples from the parameterized uniform distribution.

Return type:

ndarray or scalar

See also

randint

Discrete uniform distribution, yielding integers.

random_integers

Discrete uniform distribution over the closed interval [low, high].

random_sample

Floats uniformly distributed over [0, 1).

random

Alias for random_sample.

rand

Convenience function that accepts dimensions as input, e.g., rand(2,2) would generate a 2-by-2 array of floats, uniformly distributed over [0, 1).

random.Generator.uniform

which should be used for new code.

Notes

The probability density function of the uniform distribution is

\[p(x) = \frac{1}{b - a}\]

anywhere within the interval [a, b), and zero elsewhere.

When high == low, values of low will be returned. If high < low, the results are officially undefined and may eventually raise an error, i.e. do not rely on this function to behave when passed arguments satisfying that inequality condition. The high limit may be included in the returned array of floats due to floating-point rounding in the equation low + (high-low) * random_sample(). For example:

>>> x = np.float32(5*0.99999999)
>>> x
5.0

Examples

Draw samples from the distribution:

>>> s = np.random.uniform(-1,0,1000)

All values are within the given interval:

>>> np.all(s >= -1)
True
>>> np.all(s < 0)
True

Display the histogram of the samples, along with the probability density function:

>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s, 15, density=True)
>>> plt.plot(bins, np.ones_like(bins), linewidth=2, color='r')
>>> plt.show()