Distance¶
Graph distance methods to compare two networks.
Base class¶

class
netrd.distance.
BaseDistance
[source]¶ Base class for all distance algorithms.
The basic usage of a distance algorithm is as follows:
>>> dist_obj = DistanceAlgorithm() >>> distance = dist_obj.dist(G1, G2, <some_params>) >>> # or alternatively: distance = dist_obj.results['dist']
Here, G1 and G2 are
nx.Graph
objects (or subclasses such asnx.DiGraph
). The results dictionary holds the distance value, as well as any other values that were computed as a side effect.
Available distances¶
All of the following algorithms inherit from BaseDistance
and have the
same general usage as above.
JensenShannon divergence between communicability sequences. 

Compare two degree distributions. 

Compare matrices related to Fast Belief Propagation. 

Distributional Nonbacktracking Spectral Distance. 

Compare graphs based on their \(dk\)series. 

Compare two graphs by their network node dispersion. 

The Frobenius distance between their adjacency matrices. 

Entrywise disagreement between adjacency matrices. 

Combination of Hamming and IpsenMikhailov distances. 

Compares the spectrum of the Laplacian matrices. 

Jaccard distance between edge sets. 

Flexible distance able to compare the spectrum of the Laplacian in many ways. 

Compares the empirical spectral distribution of the nonbacktracking matrices. 

Compares spectral node signature distributions. 

Compares node signature distributions. 

Compares various types of feature distributions. 

Compares polynomials relating to the eigenvalues of the adjacency matrices. 

Compares graph portraits. 

Compares the spectral entropies of the density matrices. 

Compares the resistance matrices. 
Reference¶

class
netrd.distance.
Hamming
[source]¶ Entrywise disagreement between adjacency matrices.

dist
(G1, G2)[source]¶ The proportion of disagreeing nodes between the flattened adjacency matrices.
If \(u\) and \(v\) are boolean vectors, then Hamming distance is:
\[\frac{c_{01} + c_{10}}{n}\]where \(c_{ij}\) is the number of occurrences of where \(u[k] = i\) and \(v[k] = j\) for \(k < n\).
The graphs must have the same number of nodes. A small modification to this code could allow weights can be applied, but only one set of weights that apply to both graphs.
The results dictionary also stores a 2tuple of the underlying adjacency matrices in the key ‘adjacency_matrices’.
 Parameters
 G1, G2 (nx.Graph)
two networkx graphs to be compared.
 Returns
 dist (float)
the distance between G1 and G2.
References


class
netrd.distance.
Frobenius
[source]¶ The Frobenius distance between their adjacency matrices.

dist
(G1, G2)[source]¶ Frobenius distance between two graphs.
If \(a_{ij}\) and \(b_{ij}\) are the two adjacency matrices we define
\[d(G1, G2) = \sqrt{\sum_{i,j} a_{ij}  b_{ij}^2}\]The results dictionary also stores a 2tuple of the underlying adjacency matrices in the key ‘adjacency_matrices’.
 Parameters
 G1, G2 (nx.Graph)
two graphs to compare
 Returns
 float
the distance between G1 and G2
Notes
The graphs must have the same number of nodes.


class
netrd.distance.
PortraitDivergence
[source]¶ Compares graph portraits.

dist
(G1, G2, bins=None, binedges=None)[source]¶ Distance measure based on the two graphs’ “portraits”.
The results dictionary also stores a 2tuple of the underlying adjacency matrices in the key ‘adjacency_matrices’ and the portrait matrices in ‘portrait_matrices’.
 Parameters
 G1, G2 (nx.Graph)
two graphs
 bins (int)
width of bins in percentiles
 binedges (list)
vector of bin edges (mutually exclusive from bins)
 Returns
 dist (float)
the portrait divergence between two graphs.
References
[1] An informationtheoretic, allscales approach to comparing networks James P. Bagrow and Erik M. Bollt, 2018 arXiv:1804.03665


class
netrd.distance.
JaccardDistance
[source]¶ Jaccard distance between edge sets.

dist
(G1, G2)[source]¶ Compute the Jaccard index between two graphs.
The Jaccard index between two sets
\[J(A, B) = \frac{A \cap B}{A \cup B}\]provides a measure of similarity between sets. Here, we use the edge sets of two graphs. The index, a measure of similarity, is converted to a distance
\[d_J(A, B) = 1  J(A, B)\]for consistency with other graph distances.
 Parameters
 G1, G2 (nx.Graph)
two graphs to be compared.
 Returns
 dist (float)
the distance between G1 and G2.


class
netrd.distance.
IpsenMikhailov
[source]¶ Compares the spectrum of the Laplacian matrices.

dist
(G1, G2, hwhm=0.08)[source]¶ Compare the spectrum ot the associated Laplacian matrices.
The results dictionary also stores a 2tuple of the underlying adjacency matrices in the key ‘adjacency_matrices’.
 Parameters
 G1, G2 (nx.Graph)
two networkx graphs to be compared.
 hwhm (float)
half with at half maximum of the lorentzian kernel.
 Returns
 dist (float)
the distance between G1 and G2.
Notes
Requires undirected networks.
References


class
netrd.distance.
HammingIpsenMikhailov
[source]¶ Combination of Hamming and IpsenMikhailov distances.

dist
(G1, G2, combination_factor=1)[source]¶ Graph distance combining local and global distances.
The local metric H is the Hamming distance, corresponding to the difference for the edges in both networks.
The global (spectral) metric IM is the IpsenMikailov distance, corresponding to the squareroot of the squared difference of the Laplacian spectrum for each network.
The HammingIpsenMikhailov (HIM) distance is an Euclidean metric on the space created by the Cartesian product of the metric space associated with H and IM. The tradeoff between global and local information is governed by a combination factor: when this is one, local and global information are balanced; when it is zero, it reduces to the (local) Hamming distance; and as it approaches infinity it becomes the (global) IpsenMikhailov distance.
The results dictionary also stores a 2tuple of the underlying adjacency matrices in the key ‘adjacency_matrices’, the Hamming distance in ‘hamming_dist’, the IpsenMikhailov distance in ‘ipsen_mikhailov_dist’, and the Lorentzian halfwidth at halfmaximum in ‘hwhm’. If the networks being compared are directed, the augmented adjacency matrices are calculated and stored in ‘augmented_adjacency_matrices’.
 Parameters
 G1, G2 (nx.Graph)
two networkx graphs to be compared.
 combination_factor (float)
positive factor in front of the IM metric.
 Returns
 dist (float)
the distance between G1 and G2.
Notes
Requires networks with the same number of nodes. The networks can be directed and weighted (with weights in the range \([0,1]\)). Both (H and IM) are also saved in the results dictionary.
References


class
netrd.distance.
ResistancePerturbation
[source]¶ Compares the resistance matrices.

dist
(G1, G2, p=2)[source]¶ The pnorm of the difference between two graph resistance matrices.
The resistance perturbation distance changes if either graph is relabeled (it is not invariant under graph isomorphism), so node labels should be consistent between the two graphs being compared. The distance is not normalized.
The resistance matrix of a graph \(G\) is calculated as \(R = \text{diag}(L_i) 1^T + 1 \text{diag}(L_i)^T  2L_i\), where \(L_i\) is the MoorePenrose pseudoinverse of the Laplacian of \(G\).
The resistance perturbation distance between \(G_1\) and \(G_2\) is calculated as the \(p\)norm of the difference in their resitance matrices,
\[d_{r(p)} =  R^{(1)}  R^{(2)}  = ( \sum_{i,j \in V}  R^{(1)}_{i,j}  R^{(2)}_{i,j} ^p )^{1/p},\]where \(R^{(1)}\) and \(R^{(2)}\) are the resistance matrices of \(G_1\) and \(G_2\) respectively. When \(p = \infty\), we have
\[d_{r(\infty)} = \max_{i,j \in V} R^{(1)}_{i,j}  R^{(2)}_{i,j}.\]This method assumes that the input graphs are undirected; if directed graphs are used, it will coerce them to undirected graphs and emit a RuntimeWarning.
The results dictionary also stores a 2tuple of the underlying resistance matrices in the key ‘resistance_matrices’.
 Parameters
 G1, G2 (nx.Graph)
two networkx graphs to be compared.
 p (float or str, optional)
\(p\)norm to take of the difference between the resistance matrices. Specify
np.inf
to take \(\infty\)norm.
 Returns
 dist (float)
the distance between G1 and G2.
References


class
netrd.distance.
NetSimile
[source]¶ Compares node signature distributions.

dist
(G1, G2)[source]¶ A scalable approach to network similarity.
A network similarity measure based on node signature distributions.
The results dictionary includes the underlying feature matrices in ‘feature_matrices’ and the underlying signature vectors in ‘signature_vectors’.
 Parameters
 G1, G2 (nx.Graph)
two undirected networkx graphs to be compared.
 Returns
 dist (float)
the distance between G1 and G2.
References
 1
Michele Berlingerio, Danai Koutra, Tina EliassiRad, Christos Faloutsos: NetSimile: A Scalable Approach to SizeIndependent Network Similarity. CoRR abs/1209.2684 (2012)


class
netrd.distance.
NetLSD
[source]¶ Compares spectral node signature distributions.

dist
(G1, G2, normalization=None, timescales=None)[source]¶ NetLSD: Hearing the Shape of a Graph.
A network similarity measure based on spectral node signature distributions.
The results dictionary includes the underlying signature vectors in ‘signatures’.
 Parameters
 G1, G2 (nx.Graph)
two undirected networkx graphs to be compared.
 normalization (str)
type of normalization of the heat kernel vectors. either ‘complete’, ‘empty’ or ‘none’
 timescales (np.ndarray)
timescales for the comparison. None yields default.
 Returns
 dist (float)
the distance between G1 and G2.
References
 1
A. Tsitsulin, D. Mottin, P. Karras, A. Bronstein & E. Müller. NetLSD: Hearing the Shape of a Graph. KDD 2018


class
netrd.distance.
LaplacianSpectral
[source]¶ Flexible distance able to compare the spectrum of the Laplacian in many ways.

dist
(G1, G2, normed=True, kernel='normal', hwhm=0.011775, measure='jensenshannon', k=None, which='LM')[source]¶ Graph distances using different measure between the Laplacian spectra of the two graphs
The spectra of both Laplacian matrices (normalized or not) is computed. Then, the discrete spectra are convolved with a kernel to produce continuous ones. Finally, these distribution are compared using a metric.
The results dictionary also stores a 2tuple of the underlying adjacency matrices in the key ‘adjacency_matrices’, the Laplacian matrices in ‘laplacian_matrices’, the eigenvalues of the Laplacians in ‘eigenvalues’. If the networks being compared are directed, the augmented adjacency matrices are calculated and stored in ‘augmented_adjacency_matrices’.
 Parameters
 G1, G2 (nx.Graph)
two networkx graphs to be compared.
 normed (bool)
If True, uses the normalized laplacian matrix, otherwise the raw laplacian matrix is used.
 kernel (str)
kernel to obtain a continuous spectrum. Choices available are ‘normal’, ‘lorentzian’, or None. If None is chosen, the discrete spectrum is used instead, and the measure is simply the euclidean distance between the vector of eigenvalues for each graph.
 hwhm (float)
halfwidth at halfmaximum for the kernel. The default value is chosen such that the standard deviation for the normal distribution is \(0.01\), as in reference [1]. This option is relevant only if kernel is not None.
 measure (str)
metric between the two continuous spectra. Choices available are ‘jensenshannon’ or ‘euclidean’. This option is relevant only if kernel is not None.
 k (int)
number of eigenvalues kept for the (discrete) spectrum, also used to create the continuous spectrum. If None, all the eigenvalues are used. k must be smaller (strictly) than the size of both graphs.
 which (str)
if k is not None, this option specifies the eigenvalues that are kept. See the choices offered by scipy.sparse.linalg.eigsh. The largest eigenvalues in magnitude are kept by default.
 Returns
 dist (float)
the distance between G1 and G2.
Notes
The methods are usually applied to undirected (unweighted) networks. We however relax this assumption using the same method proposed for the HammingIpsenMikhailov. See [2].
References


class
netrd.distance.
PolynomialDissimilarity
[source]¶ Compares polynomials relating to the eigenvalues of the adjacency matrices.

dist
(G1, G2, k=5, alpha=1)[source]¶ Compares the polynomials of the eigenvalue decomposition of two adjacency matrices.
Note that the \(ij\)th element of where \(A^k\) corresponds to the number of paths of length \(k\) between nodes \(i\) and \(j\).
The results dictionary also stores a 2tuple of the underlying adjacency matrices in the key ‘adjacency_matrices’.
 Parameters
 G1, G2 (nx.Graph)
two networkx graphs to be compared.
 k (float)
maximum degree of the polynomial
 alpha (float)
weighting factor
 Returns
 dist (float)
Polynomial Dissimilarity between G1, G2
References
 1
Donnat, Claire, and Susan Holmes. “Tracking network dynamics: A survey of distances and similarity metrics.” arXiv preprint arXiv:1801.07351 (2018).


class
netrd.distance.
OnionDivergence
[source]¶ Compares various types of feature distributions.

dist
(G1, G2, dist='lccm')[source]¶ JensenShannon divergence between the feature distributions fixed by dist.
Assumes simple graphs.
 Parameters
 G1, G2 (nx.Graph)
two networkx graphs to be compared.
 dist (str)
type of distribution divergence to output. Choices are ‘cm’, ‘ccm’, ‘lccm_node’ and ‘lccm’. The type stand for the associated random graph ensemble. ‘cm’ compares only the degree distribution. ‘ccm’ compares the networks according to the edges degreedegree distribution. ‘lccm_node’ compares the distribution of nodes according to their onion centrality (degree, coreness, and layer within core). Finally, ‘lccm’ compares the networks according to the edges joint degree, coreness and layer distribution for both endpoints.
 Returns
 dist (float)
the distance between G1 and G2.
References


class
netrd.distance.
DeltaCon
[source]¶ Compare matrices related to Fast Belief Propagation.

dist
(G1, G2, exact=True, g=None)[source]¶ DeltaCon is based on the Matsusita between matrices created from fast belief propagation (FBP) on graphs G1 and G2.
Because the FBP algorithm requires a costly matrix inversion, there is a faster, roughly linear, algorithm that gives approximate results.
 Parameters
 G1, G2 (nx.Graph)
two networkx graphs to be compared.
 exact (bool)
if True, use the slower but exact algorithm (DeltaCon_0)
 g (int)
the number of groups to use in the efficient algorithm. If exact is set to False but g is not set, the efficient algorithm will still behave like the exact algorithm, since each node is put in its own group.
 Returns
 dist (float)
the distance between G1 and G2.
References
 1
Koutra, Danai, Joshua T. Vogelstein, and Christos Faloutsos. 2013. “Deltacon: A Principled MassiveGraph Similarity Function.” In Proceedings of the 2013 SIAM International Conference on Data Mining, 162–70. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611972832.18.


class
netrd.distance.
QuantumJSD
[source]¶ Compares the spectral entropies of the density matrices.

dist
(G1, G2, beta=0.1, q=None)[source]¶ Square root of the quantum \(q\)JensenShannon divergence between two graphs.
The generalized JensenShannon divergence compares two graphs b √(H0  0.5 * (H1 + H2))y the spectral entropies of their quantumstatisticalmechanical density matrices. It can be written as
\[\mathcal{J}_q(\mathbf{\rho}  \mathbf{\sigma}) = S_q\left( \frac{\mathbf{\rho} + \mathbf{\sigma}}{2} \right)  \frac{1}{2} [S_q(\mathbf{\rho}) + S_q(\mathbf{\sigma})],\]where \(\mathbf{\rho}\) and \(\mathbf{\sigma}\) are density matrices and \(q\) is the order parameter.
The density matrix
\[\mathbf{\rho} = \frac{e^{\beta\mathbf{L}}}{Z},\]where
\[Z = \sum_{i=1}^{N}e^{\beta\lambda_i(\mathbf{L})}\]and \(\lambda_i(\mathbf{L})\) is the \(i\), represents an imaginary diffusion process over the network with time parameter \(\beta > 0\).
For these density matrices and the mixture matrix, we calculate the Rényi entropy of order \(q\)
\[S_q = \frac{1}{1q} \log_2 \sum_{i=1}^{N}\lambda_i(\mathbf{\rho})^q,\]or, if \(q=1\), the Von Neumann entropy
\[S_1 =  \sum_{i=1}^{N}\lambda_i(\mathbf{\rho})\log_2\lambda_i(\mathbf{\rho}).\]Note that this implementation is not exact because the matrix exponentiation is performed using the Padé approximation and because of imprecision in the calculation of the eigenvalues of the density matrix.
 Parameters
 G1, G2 (nx.Graph)
two networkx graphs to be compared
 beta (float)
time parameter for diffusion propagator
 q (float)
order parameter for Rényi entropy. If None or 1, use the Von Neumann entropy (i.e., Shannon entropy) instead.
 Returns
 dist (float)
the distance between G1 and G2.
References
 1
De Domenico, Manlio, and Jacob Biamonte. 2016. “Spectral Entropies as InformationTheoretic Tools for Complex Network Comparison.” Physical Review X 6 (4). https://doi.org/10.1103/PhysRevX.6.041062.


class
netrd.distance.
CommunicabilityJSD
[source]¶ JensenShannon divergence between communicability sequences.

dist
(G1, G2)[source]¶ Compares the communicability matrix of two graphs.
This distance is based on the communicability matrix, \(C\), of a graph consisting of elements \(c_{ij}\) which are values corresponding to the numbers of shortest paths of length \(k\) between nodes \(i\) and \(j\).
The commmunicability matrix is symmetric, which means the communicability sequence is formed by flattening the upper triangular of \(C\), which is then normalized to create the communicability sequence, \(P\).
The communicability sequence entropy distance between two graphs, G1 and G2, is the JensenShannon divergence between these communicability sequence distributions, \(P1\) and \(P2\) of the two graphs.
 Parameters
 G1, G2 (nx.Graph)
two graphs
 Returns
 dist (float)
between zero and one, this is the communicability sequence distance bewtween G1 and G2.
Notes
This function uses the networkx approximation of the communicability of a graph, nx.communicability_exp, which requires G1 and G2 to be simple undirected networks. In addition to the final distance scalar, self.results stores the two vectors \(P1\) and \(P2\), their mixed vector, \(P0\), and their associated entropies.
References
 1
Estrada, E., & Hatano, N. (2008). Communicability in complex networks. Physical Review E, 77(3), 036111. https://journals.aps.org/pre/abstract/10.1103/PhysRevE.77.036111
 2
Chen, D., Shi, D. D., Qin, M., Xu, S. M., & Pan, G. J. (2018). Complex network comparison based on communicability sequence entropy. Physical Review E, 98(1), 012319.


class
netrd.distance.
DistributionalNBD
[source]¶ Distributional Nonbacktracking Spectral Distance.
Computes the distance between two graphs using the empirical spectral density of the nonbacktracking operator.
See: “Graph Comparison via the Nonbacktracking Spectrum” A. Mellor & A. Grusovin arXiv:1812.05457 / 10.1103/PhysRevE.99.052309

VECTOR_DISTANCES
= {'chebyshev': <function chebyshev>, 'euclidean': <function euclidean>}¶

dist
(G1, G2, sparse=False, shave=True, keep_evals=True, k=None, vector_distance='euclidean', **kwargs)[source]¶ Distributional Nonbacktracking Spectral Distance.
 Parameters
 G1, G2 (nx.Graph)
The two graphs to compare.
 sparse (bool)
If sparse, matrices and eigenvalues found using sparse methods. If sparse, parameter ‘k’ should also be specified. Default: False
 k (int)
The number of largest eigenvalues to be calculated for the spectral density.
 vector_distance (str)
The distance measure used to compare two empirical distributions. Currently available are ‘euclidean’ and ‘chebyshev’, implemented using SciPy. Default: ‘euclidean’
 keep_evals (bool)
If True, stores the eigenvalues of the reduced nonbacktracking matrix in self.results[‘evals’] Default: False
 Returns
 float
The distance between G1 and G2


class
netrd.distance.
dkSeries
[source]¶ Compare graphs based on their \(dk\)series.

dist
(G1, G2, d=2)[source]¶ Compute the distance between two graphs by using the JensenShannon divergence between the \(dk\)series of the graphs.
The \(dk\)series of a graph is the collection of distributions of size \(d\) subgraphs, where nodes are labelled by degrees. For simplicity, we currently consider only the \(1k\)series, i.e., the degree distribution, or the \(2k\)series, i.e., the distribution of edges between nodes of degree \((k_i, k_j)\). The distance between these \(dk\)series is calculated using the JensenShannon divergence.
 Parameters
 G1, G2 (nx.Graph)
two networkx graphs to be compared
 d (int)
the size of the subgraph to consider
 Returns
 dist (float)
the distance between G1 and G2.
References
 1
Orsini, Chiara, Marija M. Dankulov, Pol ColomerdeSimón, Almerima Jamakovic, Priya Mahadevan, Amin Vahdat, Kevin E. Bassler, et al. 2015. “Quantifying Randomness in Real Networks.” Nature Communications 6 (1). https://doi.org/10.1038/ncomms9627.


class
netrd.distance.
DMeasure
[source]¶ Compare two graphs by their network node dispersion.

dist
(G1, G2, w1=0.45, w2=0.45, w3=0.1, niter=50)[source]¶ The DMeasure is a comparison of structural dissimilarities between graphs.
The key concept is the network node dispersion
\[NND(G) = \frac{\mathcal{J}(\mathbf{P}_1,\ldots,\mathbf{P}_N)}{\log(d+1)},\]where \(\mathcal{J}\) is the JensonShannon divergence between \(N\) nodedistance distributions
\[\mathbf{P}_i = \{p_i(j)\},\]and \(p_i(j)\) is the fraction of nodes at distance \(i\) from node \(j\).
The Dmeasure itself is a weighted sum of three components: the square root of the JensenShannon divergence between the average nodedistance probabilities of the two graphs
\[\mu_j = \frac{1}{N}\sum_{i=1}^N p_i(j),\]the second term is the absolute value of the differences in the square roots of the network node dispersions of the two graphs, and the third term is the sum of the square roots of the JensenShannon divergences between the probability distributions of the alpha centralities of two graph and of their complements.
 Parameters
 G1 (nx.Graph):
the first graph to be compared.
 G2 (nx.Graph):
the second graph to be compared.
 w1 (float):
weight of the first term in the calculation; with w2 and w3, must sum to 1.0.
 w2 (float):
weight of the second term in the calculation; with w1 and w3, must sum to 1.0.
 w3 (float):
weight of the third term in the calculation; with w1 d w2, must sum to 1.0.
 niter (int):
the alpha centralities are calculated using power iteration, with this many iterations
 Returns
 dist (float):
between 0 and 1, the Dmeasure distance between G1 and G2
Notes
The default values for w1, w2, and w3 are from the original paper.
References
 1
Schieber, T. A. et al. Quantification of network structural dissimilarities. Nat. Commun. 8, 13928 (2017). https://www.nature.com/articles/ncomms13928


class
netrd.distance.
NonBacktrackingSpectral
[source]¶ Compares the empirical spectral distribution of the nonbacktracking matrices.
The eigenvalues are stored in the results dictionary.

dist
(G1, G2, topk='automatic', batch=100, tol=1e05)[source]¶ NonBacktracking Distance between two graphs.
 Parameters
 G1, G2 (nx.Graph)
The graphs to compare.
 topk (int or ‘automatic’)
The number of eigenvalues to compute. If ‘automatic’ (default), use only the eigenvalues that are larger than the square root of the largest eigenvalue. Note this may yield different number of eigenvalues for each graph.
 batch (int)
If topk is ‘automatic’, this is the number of eigenvalues to compute each time until the condition is met. Default \(100\).
 tol (float)
Numerical tolerance when computing eigenvalues.
 Returns
 float
The distance between G1 and G2
