The histogram module

pylife.utils.histogram.combine_histogram(hist_list, method='sum')[source]

Combine a list of histograms to one.

Parameters:
  • hist_list (list of pandas.Series) – list of histograms with all histograms as interval indexed pandas.Series

  • method (str or aggregating function) – method used for the aggregation, e.g. ‘sum’, ‘min’, ‘max’, ‘mean’, ‘std’ or any callable function that would aggregate a pandas.Series. default is ‘sum’

Returns:

histogram – The resulting histogram

Return type:

pd.Series

Raises:

ValueError – if the index levels of the histograms do not match.

Notes

Identical bins are grouped and then aggregated using method. Note that neither before or after the aggregation any rebinning takes place. You might consider piping your histograms through rebin_histogram() before or after combining them.

The histograms need to have compatible indices. Those can either be a simple class:pandas.IntervalIndex for a one dimensional histogram or a pandas.MultiIndex whose levels are all IntervalIndex for multidimensional histograms. For multidimensional histograms the names of the index levels must match throughout the input histogram list.

Examples

Two one dimensional histograms:

>>> h1 = pd.Series([5., 10.], index=pd.interval_range(start=0, end=2))
>>> h2 = pd.Series([12., 3., 20.], index=pd.interval_range(start=1, periods=3))
>>> h1
(0, 1]     5.0
(1, 2]    10.0
dtype: float64
>>> h2 = pd.Series([12., 3., 20.], index=pd.interval_range(start=1, periods=3))
>>> h2
(1, 2]    12.0
(2, 3]     3.0
(3, 4]    20.0
dtype: float64
>>> combine_histogram([h1, h2])
(0, 1]     5.0
(1, 2]    22.0
(2, 3]     3.0
(3, 4]    20.0
dtype: float64
>>> combine_histogram([h1, h2], method='min')
(0, 1]     5.0
(1, 2]    10.0
(2, 3]     3.0
(3, 4]    20.0
dtype: float64
>>> combine_histogram([h1, h2], method='max')
(0, 1]     5.0
(1, 2]    12.0
(2, 3]     3.0
(3, 4]    20.0
dtype: float64
>>> combine_histogram([h1, h2], method='mean')
(0, 1]     5.0
(1, 2]    11.0
(2, 3]     3.0
(3, 4]    20.0
dtype: float64

Limitations

At the moment, additional dimensions i.e. index level that are not histogram bins, are not supported. This limitation might fall in the future.

pylife.utils.histogram.rebin_histogram(histogram, binning, nan_default=False)[source]

Rebin a histogram to a given binning.

Parameters:
Returns:

rebinned – The rebinned histogram

Return type:

pandas.Series with pandas.IntervalIndex

Raises:
  • TypeError – if the histogram or the binning do not have an IntervalIndex.

  • ValueError – if the binning is not monotonic increasing or has gaps.

Notes

The events collected in the bins of the original histogram are distributed linearly to the bins in the target bins.

Examples

>>> h
(0.0, 1.0]    1.0
(1.0, 2.0]    2.0
(2.0, 3.0]    3.0
(3.0, 4.0]    4.0
dtype: float64
>>> h = pd.Series([10.0, 20.0, 30.0, 40.0], index=pd.interval_range(0.0, 4.0, 4))
>>> h
(0.0, 1.0]    10.0
(1.0, 2.0]    20.0
(2.0, 3.0]    30.0
(3.0, 4.0]    40.0
dtype: float64

Rebin to a finer binning:

>>> target_binning = pd.interval_range(0.0, 4.0, 8)
>>> rebin_histogram(h, target_binning)
(0.0, 0.5]     5.0
(0.5, 1.0]     5.0
(1.0, 1.5]    10.0
(1.5, 2.0]    10.0
(2.0, 2.5]    15.0
(2.5, 3.0]    15.0
(3.0, 3.5]    20.0
(3.5, 4.0]    20.0
dtype: float64

Rebin to a coarser binning:

>>> target_binning = pd.interval_range(0.0, 4.0, 2)
>>> rebin_histogram(h, target_binning)
(0.0, 2.0]    30.0
(2.0, 4.0]    70.0
dtype: float64

Define the target bin just by an int:

>>> rebin_histogram(h, 5)
(0.0, 0.8]     8.0
(0.8, 1.6]    14.0
(1.6, 2.4]    20.0
(2.4, 3.2]    26.0
(3.2, 4.0]    32.0
dtype: float64

Limitations

At the moment, additional dimensions i.e. index level that are not histogram bins, are not supported. This limitation might fall in the future.