Prob.itsample#
Inverse transform sampling, for sampling from arbitrary probability density functions.
- adaptive_chebfit(pdf, lower_bd, upper_bd, eps=1e-15)#
Fit a chebyshev polynomial, increasing sampling rate until coefficient tail falls below provided tolerance.
- Parameters:
pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.
lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.
upper_bd (float) – Upper bound of the support of the pdf.
eps (float) – Error tolerance of Chebyshev polynomial fit of PDF.
- Returns:
x (array) – The nodes at which the polynomial interpolation takes place. These are adaptively chosen based on the provided tolerance.
coeffs (array) – Coefficients in Chebyshev approximation of the PDF.
Notes
This fit defines the “error” as the magnitude of the tail of the Chebyshev coefficients. Computing the true error (i.e. discrepancy between the PDF and it’s approximant) would be much slower, so we avoid it and use this rough approximation in its place.
- chebcdf(pdf, lower_bd, upper_bd, eps=1e-15)#
Get Chebyshev approximation of the CDF.
- Parameters:
pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.
lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.
upper_bd (float) – Upper bound of the support of the pdf.
eps (float) – Error tolerance of Chebyshev polynomial fit of PDF.
- Returns:
cdf – The cumulative density function of the (normalized version of the) provided pdf. The function cdf() takes an iterable of floats or doubles as an argument, and returns an iterable of floats of the same length.
- Return type:
function
- get_cdf(pdf, lower_bd=-inf, upper_bd=inf)#
Generate a CDF from a (possibly not normalized) pdf.
- Parameters:
pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.
lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.
upper_bd (float) – Upper bound of the support of the pdf.
- Returns:
cdf – The cumulative density function of the (normalized version of the) provided pdf. Will return a float if provided with a float or int; will return a numpy array if provided with an iterable.
- Return type:
function
- normalize(pdf, lower_bd=-inf, upper_bd=inf, vectorize=False)#
Normalize a non-normalized PDF.
- Parameters:
pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.
lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.
upper_bd (float) – Upper bound of the support of the pdf.
vectorize (boolean) – Vectorize the function. This slows down function calls, and so is generally set to False.
- Returns:
pdf_norm – Function with same signature as pdf, but normalized so that the integral between lower_bd and upper_bd is close to 1. Maps nicely over iterables.
- Return type:
function
- sample(pdf, num_samples, lower_bd=-inf, upper_bd=inf, guess=0, chebyshev=False)#
Sample from an arbitrary, unnormalized PDF.
- Parameters:
pdf (function, float -> float) – The probability density function (not necessarily normalized). Must take floats or ints as input, and return floats as an output.
num_samples (int) – The number of samples to be generated.
lower_bd (float) – Lower bound of the support of the pdf. This parameter allows one to manually establish cutoffs for the density.
upper_bd (float) – Upper bound of the support of the pdf.
guess (float or int) – Initial guess for the numerical solver to use when inverting the CDF.
chebyshev (Boolean, optional (default=False)) – If True, then the CDF is approximated using Chebyshev polynomials.
- Returns:
samples – An array of samples from the provided PDF, with support between lower_bd and upper_bd.
- Return type:
numpy array
Notes
For a unimodal distribution, the mode is a good choice for the parameter guess. Any number for which the CDF is not extremely close to 0 or 1 should be acceptable. If the cdf(guess) is near 1 or 0, then its derivative is near 0, and so the numerical root finder will be very slow to converge.
This sampling technique is slow (~3 ms/sample for a unit normal with initial guess of 0), since we re-integrate to get the CDF at every iteration of the numerical root-finder. This is improved somewhat by using Chebyshev approximations of the CDF, but the sampling rate is still prohibitively slow for >100k samples.
- uniform(low=0.0, high=1.0, size=None)#
Draw samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
[low, high)
(includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.Note
New code should use the ~numpy.random.Generator.uniform method of a ~numpy.random.Generator instance instead; please see the random-quick-start.
- Parameters:
low (float or array_like of floats, optional) – Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.
high (float or array_like of floats) – Upper boundary of the output interval. All values generated will be less than or equal to high. The high limit may be included in the returned array of floats due to floating-point rounding in the equation
low + (high-low) * random_sample()
. The default value is 1.0.size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. If size isNone
(default), a single value is returned iflow
andhigh
are both scalars. Otherwise,np.broadcast(low, high).size
samples are drawn.
- Returns:
out – Drawn samples from the parameterized uniform distribution.
- Return type:
ndarray or scalar
See also
randint
Discrete uniform distribution, yielding integers.
random_integers
Discrete uniform distribution over the closed interval
[low, high]
.random_sample
Floats uniformly distributed over
[0, 1)
.random
Alias for random_sample.
rand
Convenience function that accepts dimensions as input, e.g.,
rand(2,2)
would generate a 2-by-2 array of floats, uniformly distributed over[0, 1)
.random.Generator.uniform
which should be used for new code.
Notes
The probability density function of the uniform distribution is
\[p(x) = \frac{1}{b - a}\]anywhere within the interval
[a, b)
, and zero elsewhere.When
high
==low
, values oflow
will be returned. Ifhigh
<low
, the results are officially undefined and may eventually raise an error, i.e. do not rely on this function to behave when passed arguments satisfying that inequality condition. Thehigh
limit may be included in the returned array of floats due to floating-point rounding in the equationlow + (high-low) * random_sample()
. For example:>>> x = np.float32(5*0.99999999) >>> x 5.0
Examples
Draw samples from the distribution:
>>> s = np.random.uniform(-1,0,1000)
All values are within the given interval:
>>> np.all(s >= -1) True >>> np.all(s < 0) True
Display the histogram of the samples, along with the probability density function:
>>> import matplotlib.pyplot as plt >>> count, bins, ignored = plt.hist(s, 15, density=True) >>> plt.plot(bins, np.ones_like(bins), linewidth=2, color='r') >>> plt.show()