# clearing up the configuration model

It has long been known that classical Erdös-Renyi random graphs are rather limited in the types of degree distributions they can produce. The degree of any given node follows a binomial distribution, which goes over into a Poisson distribution in the sparse limit. In contrast, many real-world networks possess power law degree sequences that would be extremely rare under a Poisson distribution. To address this problem, many have turned to the so-called ** configuration model**, which produces a random graph with an arbitrary degree sequence.

A configuration network is constructed in the following manner. Each node has a desired degree , so we imagine that each node has edge “stubs” attached to it. Edges are then assigned by randomly choosing two stubs and drawing an edge between them. This results in a relatively simple random graph model that can reproduce any desired degree sequence.

Yet this model presents several computational difficulties. For instance, the individual edges are not independent events, so it is difficult to obtain the probability that two particular nodes are connected. This is important because one might be interested in quantities like the average number of edges lying within a certain set of nodes, which rely on such probabilities.

The solution to these difficulties is to use a slightly different random graph model which merely fixes the *expected* degree sequence rather than the actual degree sequence. This model was originally introduced by Chung and Lu, and unfortunately has remained without a name throughout much of the literature. We simply refer to it as the model in keeping with the original paper.

According to the original formulation by Chung and Lu, the model assigns an expected degree to each node, and each possible edge exists independently with probability

so that the expected degree of a node is given by

as desired.

However, this standard presentation of the model ignores one tiny detail: self-links or loops. One can immediately see that such edges are necessary, for without the possibility of loops the expected degree of a node becomes

and we no longer obtain a graph with the desired expected degree sequence. Of course, in the limit of infinite size (and finite degrees), the discrepancy disappears, which helps explain why this problem has generally gone unnoticed. After all, the model was originally proposed in order to facilitate calculations of certain quantities in the limit of infinite size. But in any finite graph (i.e., all real-world networks), this discrepancy can have rather large effects, so we are forced to include self-links. Nor is this a radical departure from other models: the configuration model also allows for self-links (as well as multi-links) because there is a nonzero probability that two stubs from the same node will be chosen to be connected.

However, once we allow for the possibility of self-links, we encounter the tricky question of how such edges are counted toward the degree of node or the number of edges in the network. Most would agree that a self-link should still count for a single edge in the total number of edges . As for the degree , we note that the famous “handshake theorem” leads to the relation

between the individual degrees and the total number of edges. If we wish to preserve this useful identity (as well as the handshake theorem that leads to it), we must count each self-link *twice* when calculating the degree of a given node. This is also necessary to obtain an oft quoted formula for the expected number of edges within a given set of nodes

which is used in the derivation of the modularity function. Moreover, this counting scheme agrees with the original configuration model as well: a self-link uses up two stubs from that node, so we must count this edge twice if we are to end up with the desired degree sequence.

Yet if we agree to count each self-link twice when calculating the degree of a given node, the original probability becomes invalid again, because it leads to an average degree of

Thus, the only way to formulate the model so that it is exact for finite graphs and preserves the handshake theorem is to take equal to

I’ve come across your link preparing a presentation involving modularity. Whenever I deal with it “manually” I forget to include the potential self-links. Apart from that I just realized, isn’t it strange to punish a modularity assignment for the non-existing self-links of its contained nodes when the graph we’re looking at often doesn’t allow self-links, semantically? For the sake of the presentation, it’s great to have your algebraic argument in the back, so thanks :). However I wonder if the p_ij might be adapted such that everything sums up after all when excluding self-links… good stuff in any case!