Reconstruction¶
Algorithms to recosntruct a graph from time series data.
Base class¶

class
netrd.reconstruction.
BaseReconstructor
[source]¶ Base class for graph reconstruction algorithms.
The basic usage of a graph reconstruction algorithm is as follows:
>>> reconstructor = ReconstructionAlgorithm() >>> G = reconstructor.fit(TS, <some_params>) >>> # or alternately, G = reconstructor.results['graph']
Here, TS is an \(N \times L\) numpy array consisting of \(L\) observations for each of \(N\) sensors. This constrains the graphs to have integervalued nodes.
The
results
dict object, in addition to containing the graph object, may also contain objects created as a side effect of reconstructing the network, which may be useful for debugging or considering goodness of fit. What is returned will vary between reconstruction algorithms.
Available algorithms¶
All of the following algorithms inherit from BaseReconstructor
and have
the same general usage as above.
Infers dynamical causal relations. 

Uses the correlation matrix. 

Minimum spanning tree connecting the sensors. 

Applies free energy principle. 

Uses the Granger causality between nodes. 

Performs graphical lasso. 

Uses MarchenkoPastur law to remove noise. 

Uses maximum likelihood estimation. 

Uses the mutual information between nodes. 

Uses transfer entropy between sensors. 

Assumes a OrsteinUhlenbeck generative model. 

Optimizes causation entropy. 

Uses average effect from a sensor to all others. 

Uses a regularized form of the precision matrix. 

Returns a random graph (dummy class). 

Uses ThoulessAndersonPalmer mean field approximation. 
Reference¶

class
netrd.reconstruction.
BaseReconstructor
[source] Base class for graph reconstruction algorithms.
The basic usage of a graph reconstruction algorithm is as follows:
>>> reconstructor = ReconstructionAlgorithm() >>> G = reconstructor.fit(TS, <some_params>) >>> # or alternately, G = reconstructor.results['graph']
Here, TS is an \(N \times L\) numpy array consisting of \(L\) observations for each of \(N\) sensors. This constrains the graphs to have integervalued nodes.
The
results
dict object, in addition to containing the graph object, may also contain objects created as a side effect of reconstructing the network, which may be useful for debugging or considering goodness of fit. What is returned will vary between reconstruction algorithms.

class
netrd.reconstruction.
RandomReconstructor
[source]¶ Returns a random graph (dummy class).

fit
(TS, threshold_type='range', **kwargs)[source]¶ Return a random correlation matrix with a threshold.
The results dictionary also stores the weight matrix as ‘weights_matrix’ and the thresholded version of the weight matrix as ‘thresholded_matrix’.
 Parameters
 TS (np.ndarray)
array consisting of \(L\) observations from \(N\) sensors.
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph)
a reconstructed graph with \(N\) nodes.


class
netrd.reconstruction.
CorrelationMatrix
[source]¶ Uses the correlation matrix.

fit
(TS, num_eigs=None, threshold_type='range', **kwargs)[source]¶ Uses the correlation matrix.
If
num_eigs
is None, perform the reconstruction using the unregularized correlation matrix. Otherwise, construct a regularized precision matrix usingnum_eigs
eigenvectors and eigenvalues of the correlation matrix. For details on the regularization method, see [1]. The results dictionary also stores the raw correlation matrix (potentially regularized) as ‘weights_matrix’ and the thresholded version of the correlation matrix as ‘thresholded_matrix’. For details see [2]. Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from \(N\) sensors
 num_eigs (int)
The number of eigenvalues to use. (This corresponds to the amount of regularization.) The number of eigenvalues used must be less than \(N\).
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using **kwargs.
 Returns
 G (nx.Graph)
a reconstructed graph.
References


class
netrd.reconstruction.
PartialCorrelationMatrix
[source]¶ Uses a regularized form of the precision matrix.

fit
(TS, index=None, drop_index=True, of_residuals=False, threshold_type='range', **kwargs)[source]¶ Uses a regularized form of the precision matrix.
The results dictionary also stores the weight matrix as ‘weights_matrix’ and the thresholded version of the weight matrix as ‘thresholded_matrix’. For details see [1].
 Parameters
 index (int, array of ints, or None)
Take the partial correlations of each pair of elements holding constant an index variable or set of index variables. If None, take the partial correlations of the variables holding constant all other variables.
 drop_index (bool)
If True, drop the index variables after calculating the partial correlations.
 of_residuals (bool)
If True, after calculating the partial correlations (presumably using a dropped index variable), recalculate the partial correlations between each variable, holding constant all other variables.
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph)
a reconstructed graph.
References


class
netrd.reconstruction.
PartialCorrelationInfluence
[source]¶ Uses average effect from a sensor to all others.

fit
(TS, index=None, threshold_type='range', **kwargs)[source]¶ Uses the average effect of a series \(Z\) on the correlation between a series \(X\) and all other series.
The partial correlation influence:
\[d(X:Z) = <d(X,Y:Z)>_Y \neq X,\]where \(d(X,Y:Z) = \rho(X,Y)  \rho(X,Y:Z)\)
If an index is given, both terms become partial correlations:
\[d(X,Y:Z) ≡ ρ(X,Y:M) − ρ(X,Y:M,Z)\]The results dictionary also stores the matrix of partial correlations as ‘weights_matrix’ and the thresholded version of the partial correlation matrix as ‘thresholded_matrix’.
 Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from \(N\) sensors.
 index (int, array of ints, or None)
An index variable or set of index variables, which are assumed to be confounders of all other variables. They are held constant when calculating the partial correlations. Default to None.
 threshold_type (str):
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph)
a reconstructed graph.
References
 1
Kenett, D. Y. et al. Dominating clasp of the financial sector revealed by partial correlation analysis of the stock market. PLoS ONE 5, e15032 (2010).
 2
Kenett, D. Y., Huang, X., Vodenska, I., Havlin, S. & Stanley, H. E. Partial correlation analysis: applications for financial markets. Quantitative Finance 15, 569–578 (2015).


class
netrd.reconstruction.
FreeEnergyMinimization
[source]¶ Applies free energy principle.

fit
(TS, threshold_type='degree', **kwargs)[source]¶ Infer internode coupling weights by minimizing a free energy over the data structure.
The results dictionary also stores the weight matrix as ‘weights_matrix’ and the thresholded version of the weight matrix as ‘thresholded_matrix’. For details see [1].
 Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from :math.`N` sensors.
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph or nx.DiGraph)
a reconstructed graph.
References


class
netrd.reconstruction.
ThoulessAndersonPalmer
[source]¶ Uses ThoulessAndersonPalmer mean field approximation.

fit
(TS, threshold_type='range', **kwargs)[source]¶ Infer internode coupling weights using a ThoulessAndersonPalmer mean field approximation.
From the paper: “Similar to naive mean field, TAP works well only in the regime of large sample sizes and small coupling variability. However, this method leads to poor inference results in the regime of small sample sizes and/or large coupling variability.” For details see [1].
The results dictionary also stores the weight matrix as ‘weights_matrix’ and the thresholded version of the weight matrix as ‘thresholded_matrix’.
 Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from \(N\) sensors.
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph or nx.DiGraph)
a reconstructed graph.
References


class
netrd.reconstruction.
MeanField
[source]¶ 
fit
(TS, exact=True, stop_criterion=True, threshold_type='range', **kwargs)[source]¶ Infer internode coupling weights using a mean field approximation.
From the paper: “Exact mean field (eMF) is another mean field approximation, similar to naive mean field and thouless anderson palmer. We can improve the performance of this method by adding our stopping criterion. In general, eMF outperforms nMF and TAP, but it is still worse than FEM and MLE, especially in the limit of small sample sizes and large coupling variability.” For details see [1].
The results dictionary also stores the weight matrix as ‘weights_matrix’ and the thresholded version of the weight matrix as ‘thresholded_matrix’.
 Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from \(N\) sensors.
 exact (bool)
If True, use the exact mean field approximation. If False, use the naive mean field approximation.
 stop_criterion (bool)
If True, prevent overlylong runtimes. Only applies for exact mean field.
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph or nx.DiGraph)
a reconstructed graph.
References


class
netrd.reconstruction.
MaximumLikelihoodEstimation
[source]¶ Uses maximum likelihood estimation.

fit
(TS, rate=1.0, stop_criterion=True, threshold_type='degree', **kwargs)[source]¶ Infer internode coupling weights using maximum likelihood estimation methods.
The results dictionary also stores the weight matrix as ‘weights_matrix’ and the thresholded version of the weight matrix as ‘thresholded_matrix’.
 Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from \(N\) sensors.
 rate (float)
rate term in maximum likelihood
 stop_criterion (bool)
if True, prevent overlylong runtimes
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using ‘**kwargs’.
 Returns
 G (nx.Graph or nx.DiGraph)
a reconstructed graph.
References


class
netrd.reconstruction.
ConvergentCrossMapping
[source]¶ Infers dynamical causal relations.

fit
(TS, tau=1, threshold_type='range', cutoffs=[(0.95, inf)], **kwargs)[source]¶ Infer causal relation applying Takens’ Theorem of dynamical systems.
Convergent crossmapping infers dynamical causal relation between vairiables from time series data. Time series data portray an attractor manifold of the dynamical system of interest. Existing approaches of attractor reconstruction involved building the shadow manifold for a single variable \(X\), which is defined by the timelagged vectors \((X(t), X(t\tau), X(t2\tau), ..., X(t(N1)\tau))\) where \(N\) denotes number of variables in the system and \(\tau\) is an arbitrary time lagged interval. Takens’ theorem and its generalization indicated that if a variable \(X\) is causally influencing another variable \(Y\) in a dynamical system, then there exists a onetoone and localstructure preserving mapping from \(X\)’s shadow manifold to \(Y\)’s.
The convergent crossmapping algorithm first constructs the shadow data cloud (a portray of the shadow manifold) for every variable, and it then examines the mutual predictability for all pairs of them. Specifically, for a point \(u(t)\) in \(X\)’s shadow data cloud, the time indices \(\{t_1, t_2, ..., t_{N+1}\}\) of its \(N+1\) nearest neighbors are obtained, which are mapped to a neighborhood in \(Y\)’s shadow data cloud \(\{v(t_1), v(t_2), ..., v(t_{N+1})\}\). The estimate \(\hat{Y}(t)\) is computed as an average over this neighborhood, with weights decaying exponentially with corresponding distance in \(X\)’s shadow data cloud. The algorithm concludes a causal link from \(X\) to \(Y\) if correlation between \(Y\) and \(\hat{Y}\) is significant.
Furthermore, Sugihara et al.’s simulation suggested that the correlation converging to 1 as the length of time series data grows is a necessary condition for \(X\) causally affects \(Y\) in a deterministic dynamical system. The convergent crossmapping approach is thus numerically validated to infer causal relation from time series data.
The results dictionary also includes the raw Pearson correlations between elements (‘correlation_matrix’), their associated pvalues (‘pvalues_matrix’), and a matrix of the negative logarithm of the pvalues (‘weights_matrix’).
 Parameters
 TS (np.ndarray)
\(N \times L\) array consisting of \(L\) observations from \(N\) sensors.
 tau (int)
Number of time steps for a single timelag.
 Returns
 G (nx.Graph)
A reconstructed graph with \(N\) nodes.
Notes
The length of time series data must be long enough such that \(L \geq 3 + (N1)(1+\tau)\).
The \((i,j)\)th entry of the correlation matrix entails the correlation between the \(j\)th variable and its estimate from the \(i\)th variable. A similar rule applies to the pvalue matrix.
The computation complexity of this implementation is \(O(N^3 L)\).
References
 1
Sugihara et al., Detecting Causality in Complex Ecosystems, Science (2012) DOI: 10.1126/science.1227079


class
netrd.reconstruction.
MutualInformationMatrix
[source]¶ Uses the mutual information between nodes.

fit
(TS, nbins=10, threshold_type='degree', **kwargs)[source]¶ Calculates the mutual information between the probability distributions of the (binned) values of the time series of pairs of nodes.
First, the mutual information is computed between each pair of vertices. Then, a thresholding condition is applied to obtain edges.
The results dictionary also stores the weight matrix as ‘weights_matrix’ and the thresholded version of the weight matrix as ‘thresholded_matrix’.
 Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from \(N\) sensors.
 nbins (int)
number of bins for the preprocessing step (to yield a discrete probability distribution)
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph)
A reconstructed graph with \(N\) nodes.


class
netrd.reconstruction.
OUInference
[source]¶ Assumes a OrsteinUhlenbeck generative model.

fit
(TS, threshold_type='range', **kwargs)[source]¶ Infers the coupling coefficients assuming a OrsteinUhlenbeck process generative model.
The results dictionary also stores the weight matrix as ‘weights_matrix’, the covariance matrix in covariance_matrix and the thresholded version of the weight matrix as ‘thresholded_matrix’.
 Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from \(N\) sensors.
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph)
A reconstructed graph with \(N\) nodes.


class
netrd.reconstruction.
GraphicalLasso
[source]¶ Performs graphical lasso.

fit
(TS, alpha=0.01, max_iter=100, tol=0.0001, threshold_type='degree', **kwargs)[source]¶ Performs a graphical lasso.
For details see [1, 2].
The results dictionary also stores the covariance matrix as ‘weights_matrix’, the precision matrix as ‘precision_matrix’, and the thresholded version of the covariance matrix as ‘thresholded_matrix’.
This implementation uses scikitlearn’s implementation of the graphical lasso; for convenience two control parameters tol and max_iter are available to interface with their method.
 Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from \(N\) sensors.
 alpha (float, default=0.01)
Coefficient of penalization, higher values means more sparseness
 max_iter (int, default=100)
Maximum number of iterations.
 tol (float, default=0.0001)
Stop the algorithm when the duality gap is below a certain threshold.
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph)
A reconstructed graph with \(N\) nodes.
References
 1
J. Friedman, T. Hastie, R. Tibshirani, “Sparse inverse covariance estimation with the graphical lasso”, Biostatistics 9, pp. 432–441 (2008).
 2
https://github.com/CamDavidsonPilon/GraphicalLassoinFinance


class
netrd.reconstruction.
MarchenkoPastur
[source]¶ Uses MarchenkoPastur law to remove noise.

fit
(TS, remove_largest=False, metric_distance=False, threshold_type='range', **kwargs)[source]¶ Create a correlationbased graph using MarchenkoPastur law to remove noise.
A signed graph is built by constructing a projection of the empirical correlation matrix generated from the time series data after having removed noisy components. This method combines the results presented in [1], [2], and [3].
The results dictionary also stores the weight matrix as ‘weights_matrix’ and the thresholded version of the weight matrix as ‘thresholded_matrix’.
 Parameters
 TS (np.ndarray)
\(N \times L\) array consisting of \(L\) observations from \(N\) sensors.
 remove_largest (bool), optional
If
False
, all the eigenvectors associated to the significant eigenvalues will be used to reconstruct the denoised empirical correlation matrix. IfTrue
, the eigenvector associated to the largest eigenvalue (normally known as themarket
mode, [2]) is going to be excluded from the recontruction step. metric_distance (bool), optional: IfFalse
, a signed graph is obtained. The weights associated to the edges represent the denoised correlation coefficient \(\rho_{i,j}\) between time series \(i\) and \(j\). IfTrue
, the correlation is transformed by defining a metric distance between each pair of nodes where \(d_{i,j} = \sqrt{2(1\rho_{i,j})}\) as proposed in [3]. threshold_type (str): Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using**kwargs
.
 Returns
 G (nx.Graph)
A reconstructed graph with \(N\) nodes.
References
 1
Marchenko, V. A., & Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik, 114(4), 507536. http://www.mathnet.ru/links/a8d2a49dec161f50c944d9a96298c35a/sm4101.pdf
 2
Laloux, L., Cizeau, P., Bouchaud, J. P., & Potters, M. (1999). Noise dressing of financial correlation matrices. Physical review letters, 83(7), 1467. https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.83.1467
 3
Bonanno, G., Caldarelli, G., Lillo, F., Micciche, S., Vandewalle, N., & Mantegna, R. N. (2004). Networks of equities in financial markets. The European Physical Journal B, 38(2), 363371. https://link.springer.com/article/10.1140/epjb/e2004001296
Examples
import numpy as np import networkx as nx from matplotlib import pyplot as plt from netrd.reconstruction import MarchenkoPastur N = 250 T = 300 M = np.random.normal(size=(N,T)) print('Create correlated time series') market_mode = 0.4*np.random.normal(size=(1,T)) M += market_mode sector_modes = {d: 0.5*np.random.normal(size=(1,T)) for d in range(5)} for sector_mode, vals in sector_modes.items(): M[sector_mode*50:(sector_mode+1)*50,:] += vals print('Network reconstruction step') mp_net = MarchenkoPastur() G = mp_net.fit(M, only_positive=True) G_no_market = mp_net.fit(M, only_positive=True, remove_largest=True) print('Observed noisy correlation') C = np.corrcoef(M) C[C<0] = 0 # remove negative values np.fill_diagonal(C,0) # remove selfloops G_noisy = nx.from_numpy_array(C) # create graph print('Plot observed noisy correlation graph') fig, ax = plt.subplots() nx.draw(G_noisy, ax=ax) print('Plot reconstructed correlation graph') fig, ax = plt.subplots() nx.draw(G, ax=ax) print('Plot reconstructed correlation graph without market mode') fig, ax = plt.subplots() nx.draw(G_no_market, ax=ax)


class
netrd.reconstruction.
NaiveTransferEntropy
[source]¶ Uses transfer entropy between sensors.

fit
(TS, delay_max=1, n_bins=2, threshold_type='range', **kwargs)[source]¶ Calculates the transfer entropy from i –> j.
The resulting network is asymmetric, and each element \(TE_{ij}\) represents the amount of information contained about the future states of \(i\) by knowing the past states of \(i\) and past states of \(j\). Presumably, if one time series \(i\) does not depend on the other \(j\), knowing all of i does not increase your certainty about the next state of \(i\).
The reason that this method is referred to as “naive” transfer entropy is because it appears there are much more complicated conditional mutual informations that need to be calculated in order for this method to be true to the notion of information transfer. These are implemented in state of the art algorighms, as in the Java Information Dynamics Toolkit [1].
The results dictionary also stores the weight matrix as ‘weights_matrix’ and the thresholded version of the weight matrix as ‘thresholded_matrix’.
 Parameters
 TS (np.ndarray)
array consisting of \(L\) observations from \(N\) sensors.
 delay_max (int)
the number of timesteps in the past to aggregate and average in order to get \(TE_{ij}\)
 n_bins (int)
the number of bins to turn values in the time series to categorical data, which is a preprocessing step to compute entropy.
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph)
a reconstructed graph with \(N\) nodes.
References


class
netrd.reconstruction.
GrangerCausality
[source]¶ Uses the Granger causality between nodes.

fit
(TS, lag=1, threshold_type='range', **kwargs)[source]¶ Reconstruct a network based on the Granger causality. To evaluate the effect of a time series \(j\) over another, \(i\), it first evaluates the error \(e_1\) given by an autoregressive model fit with \(i\) alone. Then, it evaluates another error \(e_2\) given by an autoregressive model trained to correlate the future of \(i\) with the past of \(i\) and \(j\). The Granger causality of node \(j\) over \(i\) is simply given by \(log(var(e_1) / var(e_2))\).
It reconstructs the network by calculating the Granger causality for each pair of nodes.
 Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from \(N\) sensors.
 lag (int)
Time lag to consider.
 threshold_type (str)
Which thresholding function to use on the matrix of weights. See netrd.utilities.threshold.py for documentation. Pass additional arguments to the thresholder using
**kwargs
.
 Returns
 G (nx.Graph)
A reconstructed graph with \(N\) nodes.

static
split_data
(TS, lag)[source]¶ From a single node time series, return a training dataset with corresponding targets.
 Parameters
 TS (np.ndarray)
Array consisting of \(L\) observations from \(N\) sensors.
 lag (int)
Time lag to consider.
 Returns
 inputs (np.ndarray)
Training data for the inputs.
 targets (np.ndarray)
Training data for the targets.


class
netrd.reconstruction.
OptimalCausationEntropy
[source]¶ Optimizes causation entropy.

fit
(TS, n_bins=40, atol=1e06, **kwargs)[source]¶ Reconstruct causal parents of nodes by optimizing causation entropy.
Optimal causation entropy method reconstructs parents of nodes in a causal diagram for systems that rest on three Markov assumptions: Let \(X_t\) be the system state at time \(t\), denote node \(i\)’s causal parents as \(N_i\) and its state as \(X_t^{(i)}\). The following three statements hold for every node \(i\):
\(P(X_t  X_{t1}, X_{t2}, ...) = P(X_t  X_{t1}) = P(X_{t'}  X_{t'1})\)
\(P(X_t^{(i)}  X_{t1}) = P(X_t^{(i)}  X_{t1}^{(N_i)})\)
\(P(X_t^{(i)}  X_{t1}^{(J)}) \neq P(X_t^{(i)  X_{t1}^{(K)}})\) whenever \(J, K\) are sets of nodes such that \(J \cap N_i \neq K \cap N_i\)
Sun et al. proved that for any set of nodes \(I\) in systems satisfying the above three conditions, its causal parents \(N_I\) is the minimal set of nodes \(K\) that maximizes the causation entropy \(C_{K \rightarrow I}\). The more general form of causation entropy is defined as \(C_{J \rightarrow I  K} = H(X_{t+1}^{(I)}  X_t^{(K)})  H(X_{t+1}^{(I)}  X_t^{(K)}, X_t^{(J)})\) where \(H(XY)\) is the conditional entropy of \(X\) conditioned on \(Y\). Sun et al. also showed that the causal parents \(N_I\) can be efficiently found by first building a superset \(S \supset N_I\) via heuristic and then removing noncausal nodes in \(S\). The causal diagram can hence be reconstructed from time series data by applying the proposed algorithm to every node.
The results dictionary stores the causal parents of individual nodes in ‘parents’ and the raw adjacency matrix in ‘adjacency_matrix’.
 Parameters
 TS (np.ndarray)
\(N \times L\) array consisting of \(L\) observations from \(N\) sensors.
 data (np.ndarray)
Array of data with nodes as columns and observations of quantity on nodes as rows.
 n_bins (int)
Number of bins when transforming continuous data into its binned categorical version (universal for all nodes).
 atol (float)
Absolute tolerance to determine whether causalentropy is closed to zero.
 Returns
 G (nx.Graph)
A reconstructed graph with \(N\) nodes.
Notes
Nodes’ causal parents can be found in
results['parents']
.Current implementation naively thresholds the causation entropy to determine whether it’s closed to zero or not. This can potentially lead to sensitivity to the tolerance hyperparameter. Sun et al. suggested to perform a permutation test for every causation entropy computed to determine its siginificance, which is more costly on computations.
References
 1
Sun et al., SIAM (2015) https://doi.org/10.1137/140956166


class
netrd.reconstruction.
CorrelationSpanningTree
[source]¶ Minimum spanning tree connecting the sensors.

fit
(TS, distance='root_inv', **kwargs)[source]¶ Create a minimum spanning tree connecting the sensors.
The empirical correlation matrix is used to first compute a distance matrix and then to create a minimum spanning tree connecting all the sensors in the data. This method implements the methodology described in [1] and applied in the context of creating a graph connecting the stocks of a portfolio of generated by looking at the correlations between the daily time series of stock prices.
The results dictionary also stores the distance matrix (computed from the correlations) as ‘distance_matrix’.
 Parameters
 Returns
 G (nx.Graph)
A reconstructed graph with \(N\) nodes.
References
 1(1,2)
Mantegna, R. N. (1999). Hierarchical structure in financial markets. The European Physical Journal BCondensed Matter and Complex Systems, 11(1), 193197. DOI https://doi.org/10.1007/s100510050929 https://link.springer.com/article/10.1007/s100510050929
 2
Bonanno, G., Caldarelli, G., Lillo, F. & Mantegna, R. N. (2003) Topology of correlationbased minimal spanning trees in real and model markets. Physical Review E 68.
 3
Vandewalle, N., Brisbois, F. & Tordoir, X. (2001) Nonrandom topology of stock markets. Quantitative Finance 1, 372–374.
Examples
import numpy as np import networkx as nx from matplotlib import pyplot as plt from netrd.reconstruction import CorrelationSpanningTree N = 25 T = 300 M = np.random.normal(size=(N,T)) print('Create correlated time series') market_mode = 0.4*np.random.normal(size=(1,T)) M += market_mode sector_modes = {d: 0.5*np.random.normal(size=(1,T)) for d in range(5)} for sector_mode, vals in sector_modes.items(): M[sector_mode*5:(sector_mode+1)*5,:] += vals print('Link node colors to sectors') colors = ['b','r','g','y','m'] node_colors = [color for color in colors for __ in range(5)] print('Network reconstruction step') cst_net = CorrelationSpanningTree() G = cst_net.fit(M) print('Plot reconstructed spanning tree') fig, ax = plt.subplots() nx.draw(G, ax=ax, node_color=node_colors)
