entropy.py¶
Utility functions computing entropy of variables in time series data.
author: ChiaHung Yang
Submitted as part of the 2019 NetSI Collabathon.

netrd.utilities.entropy.
categorized_data
(raw, n_bins)[source]¶ Categorize data.
An entry in the returned array is the index of the bin of the linearlybinned raw continuous data.
 Parameters
 raw (np.ndarray)
Array of raw continuous data.
 n_bins (int)
A universal number of bins for all the variables.
 Returns
 np.ndarray
Array of bin indices after categorizing the raw data.

netrd.utilities.entropy.
conditional_entropy
(data, given)[source]¶ Conditional entropy of variables in the data conditioned on a given set of variables.
 Parameters
 data (np.ndarray)
Array of data with variables of interests as columns and observations as rows.
 given (np.ndarray)
Array of data with the conditioned variables as columns and observations as rows.
 Returns
 float
Conditional entrpoy of the variables \(\{X_i\}\) of interest conditioned on variables \(\{Y_j\}\).
Notes
\(H(\{X_i\}\{Y_j\}) =  \sum p(\{X_i\}\cup\{Y_j\}) \log_2(p(\{X_i\}\{Y_j\}))\)
The data of vairiables must be categorical.

netrd.utilities.entropy.
entropy_from_seq
(var)[source]¶ Return the Shannon entropy of a variable. This differs from Scipy’s entropy by taking a sequence of observations as input rather than a histogram or probability distribution.
 Parameters
 var (ndarray)
1D array of observations of the variable.
Notes
\(H(X) =  \sum p(X) \log_2(p(X))\)
Data of the variable must be categorical.

netrd.utilities.entropy.
joint_entropy
(data)[source]¶ Joint entropy of all variables in the data.
 Parameters
 data (np.ndarray)
Array of data with variables as columns and observations as rows.
 Returns
 float
Joint entropy of the variables of interests.
Notes
\(H(\{X_i\}) =  \sum p(\{X_i\}) \log_2(p(\{X_i\}))\)
The data of variables must be categorical.

netrd.utilities.entropy.
js_divergence
(P, Q)[source]¶ JensenShannon divergence between P and Q.
 Parameters
 P, Q (np.ndarray)
Two discrete distributions represented as 1D arrays. They are assumed to have the same support
 Returns
 float
The JensenShannon divergence between P and Q.

netrd.utilities.entropy.
linear_bins
(raw, n_bins)[source]¶ Separators of linear bins for each variable in the raw data.
 Parameters
 raw (np.ndarray)
Array of raw continuous data.
 n_bins (int)
A universal number of bins for all the variables.
 Returns
 np.ndarray
Array where a column is the separators of bins for a variable.
Notes
The bins are \(B_0 = [b_0, b_1]\), \(B_i = (b_i, b_{i+1}]\), where \(b_i\) s are the separators of bins.