The histogram
module
- pylife.utils.histogram.combine_histogram(hist_list, method='sum')[source]
Combine a list of histograms to one.
- Parameters:
hist_list (list of
pandas.Series
) – list of histograms with all histograms as interval indexedpandas.Series
method (str or aggregating function) – method used for the aggregation, e.g. ‘sum’, ‘min’, ‘max’, ‘mean’, ‘std’ or any callable function that would aggregate a
pandas.Series
. default is ‘sum’
- Returns:
histogram – The resulting histogram
- Return type:
pd.Series
- Raises:
ValueError – if the index levels of the histograms do not match.
Notes
Identical bins are grouped and then aggregated using
method
. Note that neither before or after the aggregation any rebinning takes place. You might consider piping your histograms throughrebin_histogram()
before or after combining them.The histograms need to have compatible indices. Those can either be a simple class:pandas.IntervalIndex for a one dimensional histogram or a
pandas.MultiIndex
whose levels are allIntervalIndex
for multidimensional histograms. For multidimensional histograms the names of the index levels must match throughout the input histogram list.Examples
Two one dimensional histograms:
>>> h1 = pd.Series([5., 10.], index=pd.interval_range(start=0, end=2)) >>> h2 = pd.Series([12., 3., 20.], index=pd.interval_range(start=1, periods=3)) >>> h1 (0, 1] 5.0 (1, 2] 10.0 dtype: float64 >>> h2 = pd.Series([12., 3., 20.], index=pd.interval_range(start=1, periods=3)) >>> h2 (1, 2] 12.0 (2, 3] 3.0 (3, 4] 20.0 dtype: float64 >>> combine_histogram([h1, h2]) (0, 1] 5.0 (1, 2] 22.0 (2, 3] 3.0 (3, 4] 20.0 dtype: float64 >>> combine_histogram([h1, h2], method='min') (0, 1] 5.0 (1, 2] 10.0 (2, 3] 3.0 (3, 4] 20.0 dtype: float64 >>> combine_histogram([h1, h2], method='max') (0, 1] 5.0 (1, 2] 12.0 (2, 3] 3.0 (3, 4] 20.0 dtype: float64 >>> combine_histogram([h1, h2], method='mean') (0, 1] 5.0 (1, 2] 11.0 (2, 3] 3.0 (3, 4] 20.0 dtype: float64
Limitations
At the moment, additional dimensions i.e. index level that are not histogram bins, are not supported. This limitation might fall in the future.
- pylife.utils.histogram.rebin_histogram(histogram, binning, nan_default=False)[source]
Rebin a histogram to a given binning.
- Parameters:
histogram (
pandas.Series
withpandas.IntervalIndex
) – The histogram data to be rebinnedbinning (
pandas.IntervalIndex
or int) – The given binning or number of binsnan_default (bool) – If True non occupied bins will be occupied with
np.nan
, else 0.0 Default False
- Returns:
rebinned – The rebinned histogram
- Return type:
- Raises:
TypeError – if the
histogram
or thebinning
do not have anIntervalIndex
.ValueError – if the binning is not monotonic increasing or has gaps.
Notes
The events collected in the bins of the original histogram are distributed linearly to the bins in the target bins.
Examples
>>> h (0.0, 1.0] 1.0 (1.0, 2.0] 2.0 (2.0, 3.0] 3.0 (3.0, 4.0] 4.0 dtype: float64 >>> h = pd.Series([10.0, 20.0, 30.0, 40.0], index=pd.interval_range(0.0, 4.0, 4)) >>> h (0.0, 1.0] 10.0 (1.0, 2.0] 20.0 (2.0, 3.0] 30.0 (3.0, 4.0] 40.0 dtype: float64
Rebin to a finer binning:
>>> target_binning = pd.interval_range(0.0, 4.0, 8) >>> rebin_histogram(h, target_binning) (0.0, 0.5] 5.0 (0.5, 1.0] 5.0 (1.0, 1.5] 10.0 (1.5, 2.0] 10.0 (2.0, 2.5] 15.0 (2.5, 3.0] 15.0 (3.0, 3.5] 20.0 (3.5, 4.0] 20.0 dtype: float64
Rebin to a coarser binning:
>>> target_binning = pd.interval_range(0.0, 4.0, 2) >>> rebin_histogram(h, target_binning) (0.0, 2.0] 30.0 (2.0, 4.0] 70.0 dtype: float64
Define the target bin just by an int:
>>> rebin_histogram(h, 5) (0.0, 0.8] 8.0 (0.8, 1.6] 14.0 (1.6, 2.4] 20.0 (2.4, 3.2] 26.0 (3.2, 4.0] 32.0 dtype: float64
Limitations
At the moment, additional dimensions i.e. index level that are not histogram bins, are not supported. This limitation might fall in the future.