Bodong Chen

Crisscross Landscapes

Notes: Morris, M. (2003). Local Rules and Global Properties: Modeling the Emergence of Network Structure

2017-06-15


References

Citekey: @Morris2003-pc

Morris, M. (2003). Local Rules and Global Properties: Modeling the Emergence of Network Structure. In Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers (pp. 174–186). National Academies Press. Retrieved from https://market.android.com/details?id=book-DOo-hq1-oF0C

Notes

Summarize: This article argues for use of statistical modeling, esp. ERGM, in network analysis. It offers some useful analysis of the gap (~15 yrs ago) and proposes approaches to mitigating the gaps. Some important points include ‘model closes the gap between theory and data’ and ‘traditional network analysis focuses on descriptives, for no apparent reasons’.

Reflect: This article is helpful for my thinking on modeling collaboration in a network manner. The connection between theory and data with model, and the connection between simple, local rules and larger network patterns are especially interesting and relevant.

Highlights

Network theory has traditionally sought to use social relations to bridge what social scientists refer to as the micro-macro gap: understanding how social structures are formed by the accumulation of simple rules operating on local relations. (p. 187)

One particularly promising approach is based on exponential random graph models (ERGM). ERGM were first applied in the context of spatial statistics, and they provide a general framework for modeling dependent data where the dependence can be thought of as a neighborhood effect. The models can be used to decompose overall network structural properties into the effects of localized interaction rules; the traditional concern of the field. An example is given using an HIV transmission network. (p. 187)

underlying process that gives rise to the patterns of links among nodes in the population – this requires us to think about a dynamic model that links local processes to a global outcome. The second is the inverse question: how to infer the underlying process from the patterns we observe – this requires statistical methods for sampling, estimation and inference. (p. 187)

some simple local rules may lead to overall network structures that display striking regularities, like linear hierarchies, stable cliques, or stable cycles of exchange. (p. 188)

idea
one idea is to model the change of a simple rule (e.g., read 2 notes before posting 1) can have an impact on social networks in collaboration. (p. 188)

Whatever the reason, however, the methodology that did develop in the field of network analysis focused on static descriptive measures, rather than the dynamic model. (p. 188)

The descriptive approach (p. 188)

have become the heart of the field, providing a rich framework for thinking about networks and a wide range of summary measures to represent both the network position occupied by specific nodes, and the overall network structure (p. 188)

Model-based methods rely on the assumption of independent observations to obtain tractable likelihood-based estimation and inference. (p. 188)

Statistical Models for Networks (p. 189)

There is general agreement that the exponential family of distributions provides a good framework for modeling the probability that a random graph (or sociomatrix) X takes the observed value x. (p. 189)

Much of the statistical theory for ERGM was developed and applied in the context of spatial statistics by Besag (1974). The simplest spatial models represent observations as points on a lattice, and assume that only the nearest neighbors have an influence on the status of a site. (p. 189)

While physical space may influence the pattern of ties, it is not the only factor. As a result, the notion of “space” in network models is largely endogeneous and neighborhoods are typically not defined in terms of proximity in the matrix. (p. 190)

Because the nature of the “spatial” dependence is the primary focus of interest in network analysis, the parameters that define this dependence are not nuisance parameters, but the parameters of most interest. (p. 190)

The first application of ERGM like models to networks was Holland and Leinhardt’s p1 model for directed graphs. (p. 191)

First, the network statistics that drive the model are simple functions of links, and the neighborhood represented by the statistics is easy to read from functional forms. (p. 191)

Second, one can also catch a glimpse of the two key mechanisms for reducing the number of parameters in these models: one is to limit the number of configurations represented as having an effect on the probability of the graph, the other is to impose “homogeneity constraints” on isomorphic configurations. (p. 191)

The next extension of ERGM in networks took on the task of modeling dyadic dependence. The ability to model dyadic dependence is the single most important feature of the ERGM framework, both because it is theoretically appealing, and because it is statistically innovative. (p. 192)

Despite the enormous flexibility the ERG framework has brought to the field, virtually all of the published applications of these models have either used a variant of the original p1 model, or the simplified Markov graph. This is somewhat curious, as both are pretty mechanical renderings of what one might think of as the “neighborhood” of influence, and neither is particularly well suited to representing the kinds of processes traditionally of interest to network modelers. (p. 193)

The lack of model development is probably due to the extremely challenging technical problems that accompany the estimation of these models.3 Another problem is the lack of data, or at least of the type of data currently required. (p. 193)

Finally, it may also be that our ability to think, empirically, in terms of positions and network structure has atrophied as we have waited for the right tools to become available. Whatever the reason, there has been remarkably little application of these modeling tools to data. (p. 193)

note
good point to make. (p. 193)

Models are the bridge between theory and data. (p. 193)

In the rest of this paper, I will show how these models can be used to formalize the investigation of how a global network structure might cumulate up from simple local rules in a specific case. (p. 193)

From local rules to global structures: Partnership networks and the spread of HIV (p. 194)

There is some reason to think that such local rules may govern the network structures of interest. (p. 194)

Mixing refers to assortative and disassortative biases in the joint distribution of partners’ attributes. Examples include the degree of matching on race, age, sex, social status, sexual orientation or level of sexual activity. Assortative biases can create patchy, clustered networks, which tends to increase the speed of spread within groups, and slow the spread between them. (p. 194)

Linking network data to network simulation requires a statistical bridge: a modeling framework that enables the key structural parameters to be estimated from network data, so that these can be used to directly drive a simulation. ERGMs have the potential to do this. (p. 195)

A Model-Based Hypothesis (p. 196)

We are now in a position to test a different kind of hypothesis about the role of networks on disease spread. (p. 196)

That is, we choose partners because they are the right sex, age, race and status, and we often care if they have other partners. (p. 196)

If simple local rules govern partner selection, then these also determine the aggregate structure in the network: what looks like an unfathomably complicated system is, in fact, produced by a few key local organizing principles. (p. 196)

idea
interesting — simple local rules that govern + actionable insights to make changes.
This example is about public health. We can apply to the health of discourse as well. (p. 196)

Both mixing and concurrency are network properties that can be measured with local network sampling strategies. If it turns out that these two local rules explain most of the variation (p. 196)

in network structure that is relevant to disease spread, then we have a simple inexpensive way to measure network vulnerability routinely in public health surveillance. And they describe simple behavioral rules that people can be taught to recognize and change. (p. 197)

So the last piece of the puzzle is to determine how to test the hypothesis that mixing and concurrency contain all of the information we need to know to evaluate the spread potential in a network. We seek a “goodness of fit” test, but one that is tailored to what we want to fit. (p. 197)

Transmission potential in a network is determined by the network connectivity. Connectivity can be measured in a number of ways, but two simple measures are the properties of reachability (is person i connected to person j by a path of some length?) and distance (what is the length of that path?). (p. 197)

We can also use the MCMC simulation engine to verify whether these features are sufficient for establishing the epidemic potential in a network. (p. 197)