entropy.py¶
Utility functions computing entropy of variables in time series data.
author: ChiaHung Yang
Submitted as part of the 2019 NetSI Collabathon.

netrd.utilities.entropy.
categorized_data
(raw, n_bins)[source]¶ Categorize data.
An entry in the returned array is the index of the bin of the linearlybinned raw continuous data.
 Parameters:
 raw (np.ndarray)
Array of raw continuous data.
 n_bins (int)
A universal number of bins for all the variables.
 Returns:
 np.ndarray
Array of bin indices after categorizing the raw data.

netrd.utilities.entropy.
conditional_entropy
(data, given)[source]¶ Conditional entropy of variables in the data conditioned on a given set of variables.
 Parameters:
 data (np.ndarray)
Array of data with variables of interests as columns and observations as rows.
 given (np.ndarray)
Array of data with the conditioned variables as columns and observations as rows.
 Returns:
 float
Conditional entrpoy of the variables \(\{X_i\}\) of interest conditioned on variables \(\{Y_j\}\).
Notes
\(H(\{X_i\}\{Y_j\}) =  \sum p(\{X_i\}\cup\{Y_j\}) \log_2(p(\{X_i\}\{Y_j\}))\)
The data of vairiables must be categorical.

netrd.utilities.entropy.
entropy_from_seq
(var)[source]¶ Return the Shannon entropy of a variable. This differs from Scipy’s entropy by taking a sequence of observations as input rather than a histogram or probability distribution.
 Parameters:
 var (ndarray)
1D array of observations of the variable.
Notes
\(H(X) =  \sum p(X) \log_2(p(X))\)
Data of the variable must be categorical.

netrd.utilities.entropy.
joint_entropy
(data)[source]¶ Joint entropy of all variables in the data.
 Parameters:
 data (np.ndarray)
Array of data with variables as columns and observations as rows.
 Returns:
 float
Joint entropy of the variables of interests.
Notes
\(H(\{X_i\}) =  \sum p(\{X_i\}) \log_2(p(\{X_i\}))\)
The data of variables must be categorical.

netrd.utilities.entropy.
js_divergence
(P, Q)[source]¶ JensenShannon divergence between P and Q.
 Parameters:
 P, Q (np.ndarray)
Two discrete distributions represented as 1D arrays. They are assumed to have the same support
 Returns:
 float
The JensenShannon divergence between P and Q.

netrd.utilities.entropy.
linear_bins
(raw, n_bins)[source]¶ Separators of linear bins for each variable in the raw data.
 Parameters:
 raw (np.ndarray)
Array of raw continuous data.
 n_bins (int)
A universal number of bins for all the variables.
 Returns:
 np.ndarray
Array where a column is the separators of bins for a variable.
Notes
The bins are \(B_0 = [b_0, b_1]\), \(B_i = (b_i, b_{i+1}]\), where \(b_i\) s are the separators of bins.