Citekey: @Goel2014

Goel, S., & Goldstein, D. G. (2014). Predicting Individual Behavior with Social Networks. Marketing Science, 33(1), 82–93. doi:10.1287/mksc.2013.0817



For over a decade, online marketers have predicted behavior at the individual level using variables such as age and sex (demographic targeting), location (geographic targeting), and website usage patterns (behavioral targeting). Today, ad servers can respond to the text of the page being viewed, be it a news story or personal email, and deliver ads on the fly that match page content (contextual targeting). (p. 2)

After so many years of advances, the baseline models for predicting consumer behavior have become strong. Nonetheless, new sources of data will continually beg the question of the degree to which targeting can be improved. (p. 2)

The main empirical and theoretical questions we address are how much information there is in the edges of a network and whether that information improves on baseline models for selecting individuals likely to undertake various actions. (p. 3)

a handful of studies in related disciplines have shown that friends of adopters are themselves more likely to adopt, even after controlling for covariates (Bhatt et al. 2010, Hill et al. 2006, Provost et al. 2009). (p. 3)

One question this previous research leaves open is whether, in practice, social network targeting is an effective strategy, as the number of people with an adopting contact may be exceedingly small. (p. 3)

To assess the value of social predictors, we examine individual-level behaviors in three diverse domains comprising 12 distinct outcomes: responding to national advertisements for 10 products and services, participating in an online recreational league with millions of players, and purchasing (off-line and online) from a national department store chain. We treat each domain in turn, and within a domain we test baseline models that range from extremely simple (using social network data or demographic data alone) to strong (adding social network data to models that include demographics as well as individuallevel transactional data). (p. 4)

  1. Response to Advertising (p. 4)

Our analysis is based on individuals within the Yahoo! communications network, where we establish an edge between all pairs of people who mutually exchanged email or instant messages during a fixed two-month period. Restricting analysis to those individuals with at least one correspondent resulted in a symmetric network of 132 million people and 719 million edges, with a mean of 11 contacts per individual. (p. 4)

The largest increase was observed for Movie 1, where people whose social contacts clicked on this ad were more than 10 times as likely (1,140%) to click than those without contacts who clicked. (p. 4)

Thus, consistent with past studies (Aral et al. 2009, Bhatt et al. 2010, Hill et al. 2006, Provost et al. 2009), contacts of adopters are themselves more likely than average to adopt, and in at least some of the campaigns we examined, this social signal is quite strong. (p. 4)

As Figure 1 indicates, the social signal allows one to construct pools of between 10,000 and 100,000 candidates who are much more likely than average to click, where the size of these pools effectively reflects the number of individuals connected to people who clicked. (p. 5)

Although we have thus far seen that social data on their own may not be particularly useful for identifying individuals likely to click on ads, it is often the case that social data are not the only resource available for targeting. As a case in point, Yahoo! collects age and sex for each of its registered users and routinely uses this information for ad targeting. (p. 5)

That contacts of those who clicked have relatively high click rates does not in itself imply that social data are valuable for prediction, in part because few people may be connected to individuals who clicked. (p. 5)

In many contexts, the central objective is to identify pools of individuals most likely to take action. (p. 5)

We thus assess the predictive value of social data via a top-k analysis (p. 5)

Given the effectiveness of demographic data to predict clicking on advertisements, do social data substantively improve on demographics in constructing pools of likely adopters? (p. 5)

To assess the marginal value of social data relative to demographic information, we generate two candidate rankings, one based only on demographic attributes and the other based (p. 5)

on both demographic and social information. (p. 6)

First, of the total candidate set of approximately eight million people, demographic information is generally useful for constructing pools of up to several million individuals that are substantially more likely than average to click. Second, even on top of this baseline, augmenting demographic information with social data often helps to identify pools of between 10,000 and 100,000 (p. 7)

  1. Recreational League Registrations (p. 7)

  2. Retail Purchases (p. 10)

a relatively small set; consequently, the reach of social targeting strategies may be small. (p. 12)

  1. Discussion and Conclusion (p. 12)

Returning to our motivating question, we find that there are a variety of circumstances in which social data are useful for identifying select groups of individuals with relatively high propensities to take various actions, from clicking on advertisements, to registering for a recreational league, to making department store purchases. Across all three domains we study, mere connection to an individual who has previously taken such an action is indicative of a higher than average propensity to act oneself. We also find that social data are not only useful in isolation but also often complement both demographic and behavioral predictors. In particular, social predictors substantially augmented demographic-based candidate selection in the shopping and fantasy football domains. (p. 12)

Furthermore, when detailed transaction data were available—as in the retail domain—we find social data provide almost no marginal benefit. More generally, it seems likely that when enough information is available at the individual level, the marginal value of network data is muted. (p. 12)

Accordingly, social data seem particularly valuable in situations where a potential target’s social network is known but information about his or her past behavior, and possibly demographic characteristics, is limited. (p. 12)

Social data have proven to be widely effective in the examples we study, but there are limits to their benefits. Specifically, when relatively few people undertake a particular action, and when individuals have few contacts, the contacts of these initial actors form (p. 12)

Finally, we note that the marketing literature we have reviewed has focused on the topic of proving, disentangling, or modeling causality in social networks. It may be tempting to conclude from our results that shopping habits or leisure activities are “contagious.” Although social influence likely plays a role in effects we find, establishing such is neither our objective nor justified from our analysis. Nevertheless, whereas the value of social data may concern both influence and homophily, our approach demonstrates that disentangling the two is not necessary for identifying and targeting likely adopters. (p. 13)

Blog Logo

Bodong Chen



Crisscross Landscapes

Bodong Chen, University of Minnesota

Back to Home