Skip to content

clearing up the configuration model

January 18, 2010

It has long been known that classical Erdös-Renyi random graphs are rather limited in the types of degree distributions they can produce. The degree of any given node follows a binomial distribution, which goes over into a Poisson distribution in the sparse limit. In contrast, many real-world networks possess power law degree sequences that would be extremely rare under a Poisson distribution. To address this problem, many have turned to the so-called configuration model, which produces a random graph with an arbitrary degree sequence.

A configuration network is constructed in the following manner. Each node has a desired degree k_i, so we imagine that each node has k_i edge “stubs” attached to it. Edges are then assigned by randomly choosing two stubs and drawing an edge between them. This results in a relatively simple random graph model that can reproduce any desired degree sequence.

Yet this model presents several computational difficulties. For instance, the individual edges are not independent events, so it is difficult to obtain the probability that two particular nodes are connected. This is important because one might be interested in quantities like the average number of edges lying within a certain set of nodes, which rely on such probabilities.

The solution to these difficulties is to use a slightly different random graph model which merely fixes the expected degree sequence rather  than the actual degree sequence. This model was originally introduced by Chung and Lu, and unfortunately has remained without a name throughout much of the literature. We simply refer to it as the G(\mathbf{w}) model in keeping with the original paper.

According to the original formulation by Chung and Lu, the G(\mathbf{w}) model assigns an expected degree w_i to each node, and each possible edge exists independently with probability

p_{ij} = \frac{ w_i w_j }{\sum_k w_k}

so that the expected degree of a node is given by

\langle k_i \rangle = \sum_j p_{ij} = w_i \sum_j w_i / \sum_k w_k = w_i

as desired.

However, this standard presentation of the G(\mathbf{w}) model ignores one tiny detail: self-links or loops. One can immediately see that such edges are necessary, for without the possibility of loops the expected degree of a node becomes

\langle k_i \rangle = \sum_{j \neq i} p_{ij} = w_i \sum_{j \neq i} w_i / \sum_k w_k = w_i \left( 1 - \frac{w_i}{\sum_k w_k} \right)

and we no longer obtain a graph with the desired expected degree sequence. Of course, in the limit of infinite size (and finite degrees), the discrepancy disappears, which helps explain why this problem has generally gone unnoticed. After all, the G(\mathbf{w}) model was originally proposed in order to facilitate calculations of certain quantities in the limit of infinite size. But in any finite graph (i.e., all real-world networks), this discrepancy can have rather large effects, so we are forced to include self-links. Nor is this a radical departure from other models: the configuration model also allows for self-links (as well as multi-links) because there is a nonzero probability that two stubs from the same node will be chosen to be connected.

However, once we allow for the possibility of self-links, we encounter the tricky question of how such edges are counted toward the degree of node or the number of edges in the network. Most would agree that a self-link should still count for a single edge in the total number of edges m. As for the degree k_i, we note that the famous “handshake theorem” leads to the relation

\sum_i k_i = 2m

between the individual degrees and the total number of edges. If we wish to preserve this useful identity (as well as the handshake theorem that leads to it), we must count each self-link twice when calculating the degree of a given node. This is also necessary to obtain an oft quoted formula for the expected number of edges within a given set of nodes

\langle e_S \rangle = \frac{1}{4m} \left(\sum_{i \in S} k_i \right)^2 = \frac{d_S^2}{4m}

which is used in the derivation of the modularity function. Moreover, this counting scheme agrees with the original configuration model as well: a self-link uses up two stubs from that node, so we must count this edge twice if we are to end up with the desired degree sequence.

Yet if we agree to count each self-link twice when calculating the degree of a given node, the original probability p_{ij} becomes invalid again, because it leads to an average degree of

\langle k_i \rangle = 2 p_{ii} + \sum_{j \neq i} p_{ij} = (w_i \sum_{j} w_i + w_i^2) / \sum_k w_k = w_i \left( 1 + \frac{w_i}{\sum_k w_k} \right)

Thus, the only way to formulate the G(\mathbf{w}) model so that it is exact for finite graphs and preserves the handshake theorem is to take p_{ij} equal to

p_{ij} =\left\{\begin{array}{ll} w_i^2 / 2m & \text{if} \, i=j \\ w_i w_j / 2m & \text{else}\end{array} \right.

2 Comments leave one →
  1. Nicolas permalink
    June 14, 2012 6:07 pm

    I’ve come across your link preparing a presentation involving modularity. Whenever I deal with it “manually” I forget to include the potential self-links. Apart from that I just realized, isn’t it strange to punish a modularity assignment for the non-existing self-links of its contained nodes when the graph we’re looking at often doesn’t allow self-links, semantically? For the sake of the presentation, it’s great to have your algebraic argument in the back, so thanks :). However I wonder if the p_ij might be adapted such that everything sums up after all when excluding self-links… good stuff in any case!


  1. random graphs with communities and arbitrary degrees « Ben Good Lately?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: