kalepy.sample module

Perform sampling of distributions and functions.

class kalepy.sample.Sample_Grid(edges, dens, mass=None, scalar_dens=None, scalar_mass=None)

Bases: object

Sample from a given probability distribution evaluated on a regular grid.

The grid has probability densities (dens) evaluated at the grid edges, and probability masses (mass) corresponding to the centroid of each bin. The centroids are calculated from the edge positions, weighted by probability density. If mass is not given, it is calculated by integrating the densities over each bin (using the trapezoid rule).

Process for drawing ‘N’ samples from the distributon:

  1. Using the masses of each bin, the CDF is calculated.

  2. N random values are chosen, and the CDF is inverted to find which bin they correspond to. The CDF is flattened into 1D to accomodate any dimensionality of grid, and then the chosen bins are re-mapped to ND space.

  3. Within each bin, the position of each drawn sample is chosen proportionally to the probability density, based on the density-gradient within each cell.

Initialize Sample_Grid with the given grid edges and probability distribution.

Parameters:
edgesarray_like

Bin edges along each dimension.

densarray_like

Probability density evaluated at grid edges.

massarray_like or None

Probability mass (i.e. number of samples) for each bin. Evaluated at bin centers or centroids. If no mass is given, it is calculated by integrating dens over each bin using the trapezoid rule. See: _init_data().

__init__(edges, dens, mass=None, scalar_dens=None, scalar_mass=None)

Initialize Sample_Grid with the given grid edges and probability distribution.

Parameters:
edgesarray_like

Bin edges along each dimension.

densarray_like

Probability density evaluated at grid edges.

massarray_like or None

Probability mass (i.e. number of samples) for each bin. Evaluated at bin centers or centroids. If no mass is given, it is calculated by integrating dens over each bin using the trapezoid rule. See: _init_data().

property grid
sample(nsamp=None, interpolate=True, return_scalar=None)

Sample from the probability distribution.

Parameters:
nsampscalar or None
interpolatebool
return_scalarbool
Returns:
vals(D, N) ndarray of scalar
class kalepy.sample.Sample_Outliers(edges, dens, threshold=10.0, **kwargs)

Bases: Sample_Grid

Sample outliers from a given probability distribution evaluated on a regular grid.

“Outliers” are points in areas of low probability mass, which are drawn randomly. “Inliers” are bins with high probability mass, which are assumed to be well represented by the centroid of those bins. The threshold parameter determines the dividing point between low and high probability masses.

The grid has probability densities (dens) evaluated at the grid edges, and probability masses (mass) corresponding to the centroid of each bin. The centroids are calculated from the edge positions, weighted by probability density. If mass is not given, it is calculated by integrating the densities over each bin (using the trapezoid rule).

## Process for drawing ‘N’ samples from the distributon: 1) Bins with ‘low’ probability density (i.e. mass < threshold) are sampled in the same way as the super-class Sample_Grid. These values are given a weight of 1.0. 2) Bins with ‘high’ probability density (mass > threshold), are all used (i.e. with no stochasticity), where the location of sample points is the bin centroid (i.e. grid points weighted by probability density), and the weight is the total bin mass.

Initialize Sample_Grid with the given grid edges and probability distribution.

Parameters:
edgesarray_like

Bin edges along each dimension.

densarray_like

Probability density evaluated at grid edges.

massarray_like or None

Probability mass (i.e. number of samples) for each bin. Evaluated at bin centers or centroids. If no mass is given, it is calculated by integrating dens over each bin using the trapezoid rule. See: _init_data().

__init__(edges, dens, threshold=10.0, **kwargs)

Initialize Sample_Grid with the given grid edges and probability distribution.

Parameters:
edgesarray_like

Bin edges along each dimension.

densarray_like

Probability density evaluated at grid edges.

massarray_like or None

Probability mass (i.e. number of samples) for each bin. Evaluated at bin centers or centroids. If no mass is given, it is calculated by integrating dens over each bin using the trapezoid rule. See: _init_data().

sample(poisson_inside=False, poisson_outside=False, **kwargs)

Outlier sample the distribution.

Parameters:
poisson_insidebool,
Returns:
nsampint
vals(D, N) ndarray

Sampled values with N samples, and values for D dimensions.

weights(N,) ndarray

Weights of samples values.

kalepy.sample.sample_grid(edges, dens, nsamp=None, mass=None, scalar_dens=None, scalar_mass=None, squeeze=None, **sample_kwargs)

Draw samples following the given distribution.

Parameters:
edges(D,) list/tuple of array_like,

Edges of the (parameter space) grid. For D dimensions, this is a list/tuple of D entries, where each entry is an array_like of scalars giving the grid-points along that dimension. For example, edges=([x, y], [a, b, c]) is a (2x3) dim array with coordinates: [(x,a), (x,b), (x,c)], [(y,a), (y,b), (y,c)].

dist(N1,…,ND) array_like of scalar,

Distribution values specified at either the grid edges, or grid centers. e.g. for the (2x3) example above, dist should be either (2,3) or (1, 2)

nsampint or None

Number of samples to draw (floats are cast to integers).

scalarNone, or array_like of scalar

Scalar values to associate with the given distribution. Can be specified at either grid-centers or grid-edges, but the latter will be averaged down to grid-center values.

sample_kwargsadditional keyword-arguments, optional

Additional arguments passed to the Sample_Grid.sample() method.

Returns:
vals(D, N) array of sample points,

Sample points drawn from the given distribution in D, number of points N is that specified by nsamp param.

[weights](N,) array of weights, returned if scalar is given

Scalar factors for each sample point.

kalepy.sample.sample_grid_proportional(edges, dens, portion, nsamp, mass=None, **sample_kwargs)
kalepy.sample.sample_outliers(edges, data, threshold, nsamp=None, mass=None, **sample_kwargs)

Sample a PDF randomly in low-density regions, and with weighted points at high-densities.

Selects (semi-)random samples from the given PDF. In high-density regions, bin centroids are used as representative points and recieve a corresponding (large) weight. Low-density regions are sampled proportionally with actual (weight = one) points.

Parameters:
edgeslist/tuple of array_like

An iterable containing the grid edges for each dimension of the space.

datandarray

Array giving the PDF to sample.

thresholdfloat

Threshold mass below which true-samples should be drawn. Representative (centroid) values will be chosen for bins above this threshold.

nsampint, optional

Number of samples to draw.

massndarray, optional

Probability mass function determining the number of samples to draw in each bin.

Returns:
vals
weights