The histogram module
- pylife.utils.histogram.combine_histogram(hist_list, method='sum')[source]
Combine a list of histograms to one.
- Parameters:
hist_list (list of
pandas.Series) – list of histograms with all histograms as interval indexedpandas.Seriesmethod (str or aggregating function) – method used for the aggregation, e.g. ‘sum’, ‘min’, ‘max’, ‘mean’, ‘std’ or any callable function that would aggregate a
pandas.Series. default is ‘sum’
- Returns:
histogram – The resulting histogram
- Return type:
pd.Series- Raises:
ValueError – if the index levels of the histograms do not match.
Notes
Identical bins are grouped and then aggregated using
method. Note that neither before or after the aggregation any rebinning takes place. You might consider piping your histograms throughrebin_histogram()before or after combining them.The histograms need to have compatible indices. Those can either be a simple class:pandas.IntervalIndex for a one dimensional histogram or a
pandas.MultiIndexwhose levels are allIntervalIndexfor multidimensional histograms. For multidimensional histograms the names of the index levels must match throughout the input histogram list.Examples
Two one dimensional histograms:
>>> h1 = pd.Series([5., 10.], index=pd.interval_range(start=0, end=2)) >>> h2 = pd.Series([12., 3., 20.], index=pd.interval_range(start=1, periods=3)) >>> h1 (0, 1] 5.0 (1, 2] 10.0 dtype: float64 >>> h2 = pd.Series([12., 3., 20.], index=pd.interval_range(start=1, periods=3)) >>> h2 (1, 2] 12.0 (2, 3] 3.0 (3, 4] 20.0 dtype: float64 >>> combine_histogram([h1, h2]) (0, 1] 5.0 (1, 2] 22.0 (2, 3] 3.0 (3, 4] 20.0 dtype: float64 >>> combine_histogram([h1, h2], method='min') (0, 1] 5.0 (1, 2] 10.0 (2, 3] 3.0 (3, 4] 20.0 dtype: float64 >>> combine_histogram([h1, h2], method='max') (0, 1] 5.0 (1, 2] 12.0 (2, 3] 3.0 (3, 4] 20.0 dtype: float64 >>> combine_histogram([h1, h2], method='mean') (0, 1] 5.0 (1, 2] 11.0 (2, 3] 3.0 (3, 4] 20.0 dtype: float64
Limitations
At the moment, additional dimensions i.e. index level that are not histogram bins, are not supported. This limitation might fall in the future.
- pylife.utils.histogram.rebin_histogram(histogram, binning, nan_default=False)[source]
Rebin a histogram to a given binning.
- Parameters:
histogram (
pandas.Serieswithpandas.IntervalIndex) – The histogram data to be rebinnedbinning (
pandas.IntervalIndexor int) – The given binning or number of binsnan_default (bool) – If True non occupied bins will be occupied with
np.nan, else 0.0 Default False
- Returns:
rebinned – The rebinned histogram
- Return type:
- Raises:
TypeError – if the
histogramor thebinningdo not have anIntervalIndex.ValueError – if the binning is not monotonic increasing or has gaps.
Notes
The events collected in the bins of the original histogram are distributed linearly to the bins in the target bins.
Examples
>>> h (0.0, 1.0] 1.0 (1.0, 2.0] 2.0 (2.0, 3.0] 3.0 (3.0, 4.0] 4.0 dtype: float64 >>> h = pd.Series([10.0, 20.0, 30.0, 40.0], index=pd.interval_range(0.0, 4.0, 4)) >>> h (0.0, 1.0] 10.0 (1.0, 2.0] 20.0 (2.0, 3.0] 30.0 (3.0, 4.0] 40.0 dtype: float64
Rebin to a finer binning:
>>> target_binning = pd.interval_range(0.0, 4.0, 8) >>> rebin_histogram(h, target_binning) (0.0, 0.5] 5.0 (0.5, 1.0] 5.0 (1.0, 1.5] 10.0 (1.5, 2.0] 10.0 (2.0, 2.5] 15.0 (2.5, 3.0] 15.0 (3.0, 3.5] 20.0 (3.5, 4.0] 20.0 dtype: float64
Rebin to a coarser binning:
>>> target_binning = pd.interval_range(0.0, 4.0, 2) >>> rebin_histogram(h, target_binning) (0.0, 2.0] 30.0 (2.0, 4.0] 70.0 dtype: float64
Define the target bin just by an int:
>>> rebin_histogram(h, 5) (0.0, 0.8] 8.0 (0.8, 1.6] 14.0 (1.6, 2.4] 20.0 (2.4, 3.2] 26.0 (3.2, 4.0] 32.0 dtype: float64
Limitations
At the moment, additional dimensions i.e. index level that are not histogram bins, are not supported. This limitation might fall in the future.