Bodong Chen

Crisscross Landscapes

Notes: Wang2016-lu-Modeling customer preferences using multidimensional network analysis in engineering design



Citekey: @Wang2016-lu

Wang, M., Chen, W., Huang, Y., Contractor, N. S., & Fu, Y. (2016). Modeling customer preferences using multidimensional network analysis in engineering design. Design Science, 2.


Summarize: This paper presents a case of using multidimensional network analysis – powered by ERGM – to model customer preferences in China’s car market. Data were from user surveys. The authors patiently set the background, introduce the techniques, compared MNA with unidimensional network analysis, and considerately presented various analytical decisions associated with these approaches.


we present a novel conceptual framework of multidimensional network analysis (MNA) for modeling customer preferences in supporting design decisions. In the proposed multidimensional customerproduct network (MCPN), customerproduct interactions are viewed as a socio-technical system where separate entities of customers' andproducts’ are simultaneously modeled as two layers of a network, and multiple types of relations, such as consideration and purchase, product associations, and customer social interactions, are considered (p. 1)

. Beyond the traditional descriptive analysis used in network analysis, we employ the exponential random graph model (ERGM) as a unified statistical inference framework to interpret complex preference decisions. Our approach broadens the traditional utility-based logit models by considering dependency among complex customerproduct relations, including the similarity of associated products, `irrationality’ of customers induced by social influence, nested multichoice decisions, and correlated attributes of customers and products. (p. 1)

In recent years, disaggregate quantitative models such as discrete choice analysis (DCA) (Ben-Akiva & Lerman 1985; Train 2009) have been widely studied by the engineering design research community for consumer preference modeling. Following the random utility theory, the customer purchase decision is captured by a utility function of product attributes/features and customer attributes (e.g., social demographic and usage attributes) (Hoyle et al. 2010; He et al. 2012). Even though DCA provides a probabilistic approach for modeling customer heterogeneity, there are several major obstacles (p. 2)

We aim to develop a preference model that broadens the utility-based DCA by considering complex customerproduct relations, including the similarity of associated products, `irrationality’ of customers induced by social influence, nested multichoice decisions, and correlated attributes of customers and products. To this end, we propose a novel MCPN framework as shown in Figure 1. (p. 3)

customerproduct interactions form a complex socio-technical system (Trist 1981), not only because there are complex relations between the customers (e.g., social interactions) and amongst the products (e.g., market segmentation or product family), but also because there exist multiple types of relations between customers and products (e.g., consideration' versuspurchase’) (p. 3)

On one hand, the social structures can influence how people conceive a new technology (or a product), as well as whether and how they will use it (Rice & Aydin 1991; Kraut et al. 1998; Karahanna et al. 1999). On the other hand, new technologies (or products) could bring changes to social and communication relationships among people (Dimaggio et al. 2001). (p. 4)

this paper represents the first attempt to introduce MNA into engineering design. The complex customerproduct interactions are represented as a multidimensional network where multiple relationships are considered, including social network relations among customers, association relations among products, as well as preference relations between customers and products. (p. 4)

the exponential random graph model (ERGM) as a unified statistical inference framework for MNA. (p. 4)

Figure 2. Development of network structures. (p. 5)

Exponential random graph models have several advantages over the utility-based logit models: (1) network links are modeled to be interdependent in ERGM rather than assumed to be independent; (2) ERGMs can incorporate binary, categorical, and continuous nodal attributes to determine whether they are associated with the formation of network links, (3) ERGMs are capable of characterizing local and global network features; (4) ERGMs can be applied in flexible ways to many dierent types of networks and relational data; (5) data used for fitting ERGMs can be cross-sectional or longitudinal (changes with time), and a dynamic model can be built to study the emergence and dynamics of a network; (6) in contrast to a machine learning model that focuses on prediction, ERGM is an explanatory model whose results can be used to derive behavioral theories and design implications. (p. 5)

the detailed development in this paper is devoted to modeling consideration decisions among product alternatives. (p. 5)

  1. Technical background (p. 6)

2.1. Network analysis in product design and market study (p. 6)

2.2. Modeling the impact of social influence (p. 6)

comprehensive study of how peer influence aects product attribute preference was provided by Narayan et al. (2011) who modeled three dierent mechanisms of social influence. (p. 6)

2.3. Advances in social network analysis (p. 7)

Development of analytical techniques to explain the emergence of networks is often motivated by the multitheoretical multilevel (MTML) framework (Monge & Contractor 2003). (p. 7)

Social network models have multilevel interpretations because the emergence of a network can be influenced, for instance, by theories of self-interest that refer to characteristics of actors (at the individual level), theories of social exchange that describe links between pairs of actors (at the dyadic level), theories of balance that explain the configuration of links among three actors (at the triadic level), and theories of collective action that explain configurations among larger aggregates of actors (at the group or network level). (p. 7)

Eq. (1) suggests that the probability of observing any particular graph (e.g. MCPN) is proportional to the exponent of a weighted combination of network characteristics: one statistic z. y/ is more likely to occur if the corresponding is positive. (p. 7)

  1. A multidimensional network approach for preference modeling (p. 8)

3.1. The multidimensional customer–product network (MCPN) framework (p. 8)

The product layer contains a collection of engineering products P (e.g., vehicles, electronics and appliances, software). Products are connected by various links which can be either directed or non-directed. Directed links often involve product hierarchy or preference, while non-directed links imply product similarity or association. Product attributes or features, quantitative (e.g., fuel eciency, horsepower) or qualitative (e.g., safety, styling), can be taken into account as nodal attributes. Similar attributes/features between products are represented as association links in the product network. Alternatively, product associations can be identified by their co-consideration relations from customers. (p. 8)

The customer layer describes a social network consisting of a customer population C who make decisions or take actions. Each customer has a unique profile (e.g., socioeconomic and anthropometric attributes, purchase history and usage context attributes, etc.) which potentially aects customer preference decisions. Links between two customers represent their social relations, such as friendship or communication. (p. 8)

As seen, the proposed MCPN framework can capture rich information on dependency in a complex socio-technical system so as to assist product design decision-making. A combined analysis of all relations mentioned above allows designers to evaluate product decisions not in isolation, but with expectation that the market system will react to the planned decisions, and any design change may easily aect other connected entities across the network in ways that were initially unintended. (p. 9)

3.2. Unidimensional network analysis of product associations (p. 9)

The unidimensional network can be viewed as a compressed but simplified version (p. 9)

Distance measures commonly used in content-based recommender systems, such as Jaccard index (Real & Vargas 1996), Cosine similarity (Chowdhury 2010), Gower similarity (Gower 1971) etc., can be applied to assess the strength of connections. (p. 9)

The descriptive network analysis involves the computation of topological measures to assess the position of nodes and the implication of structural advantages. (p. 9)

Although the unidimensional network approach can describe interdependen- cies in relational data, the method cannot provide quantitative assessment of the impact of product attributes for a particular group of customers of interest. Further, the unidimensional network analysis studies customers’ averaged (aggregated) preference across the population. Advanced network modeling approaches that capture disaggregated preference behaviors of individual customers are needed as examined next. (p. 11)

3.3. Analyzing multidimensional network considering product associations (p. 11)

Beyond existing network approaches that are mostly descriptive in nature, we use ERGM as a unified statistical framework to analyze the MCPN (p. 12)

Exponential random graph model literature has established more than 20 dierent types of eects (Lusher et al. 2012) for describing the various forms of dependence that exist in the relational data within social networks. Examples of eects, their configurations, and interpretations are provided in Table 2. (p. 12)

Read this paper (p. 12)

The network eects fall into three categories: pure structural eects, attribute-relation eects involving product/customer attributes, and cross-level eects involving both between-level and within-level relations. Pure structural eects are related to the well-known structural regularities in the network literature (e.g., eects [A] to [C] in Table 2); attribute-relation eects assume the attributes of products/customers can also influence possible tie formations in a given structure (p. 12)

Compared to a unidimensional network (Section 3.2), a multidimensional network provides a more natural way to model relations between two dierent classes of nodes (customers and products) and the non-hierarchical association relations between products. Moreover, its capacity to preserve two types of nodes allows designers to parse out the unique contribution of dierent types of nodes to the overall network structure (p. 12)

3.4. Analyzing multidimensional network incorporating social influence (p. 13)

To account for the eect of social influence on customer preference decisions, we further expand the multidimensional network structure to simultaneously measure within-layer social relations, within-layer product associations, and between-layer customerproduct relations (Figure 6). (p. 13)

Unlike the prior research that incorporates social influence as customer attributes, this research employs the ERGM to assess the social influence eects. In theory, compared to the use of DCA, one should draw more reliable conclusions based on the results from the network approach, because of its capability of handling correlated node attributes and interdependent link relations, which avoids faulty inferences on covariates (Cranmer & Desmarais 2011). (p. 15)

  1. Case study – vehicle preference modeling (p. 15)

In this section, two implementations on modeling vehicle preferences in the growing China market are presented to demonstrate the proposed methodology. (p. 15)

4.1. Using unidimensional network (p. 15)

We develop two types of product association networks  a vehicle association network with undirected links showing the similarity of products, and avehicle hierarchal network with directed links indicating preference hierarchies. (p. 15)

The two vehicle association networks are constructed using the 2013 New Car Buyers Survey (NCBS) data provided by an independent research institute in China. The dataset contains 49,921 new car buyers who considered and purchased from a pool of 389 vehicle models in 2013. (p. 15)

The vehicle association network is created to aid the analysis of customer consideration decisions by linking any pair of vehicles if both vehicles are considered by most consumers in his (her) consideration set. The association link is viewed as a form of similarity or closeness between any two vehicles (p. 15)

The link strength is quantified by lift to reflect how often the two products are compared by a population of customers. (p. 16)

To prune the network links, a thinning threshold at 1 is chosen for the lift value (p. 16)

As a measure of network centrality, the node degree calculates the number of links attached to a node. (p. 16)

For the constructed vehicle network, the product community analysis is employed following Newman’s modularity method to determine groups of interconnected vehicles (p. 16)

As a refinement to the above undirected network, a directed network is constructed where a link direction is determined through both consideration and purchase data in NCBS. (p. 16)

Again, the links are trimmed to highlight positive associations (p. 16)

Figure 7. Unimodal vehicle networks constructed from NCBS 2013. Nodes are sized by network degrees (or in-degrees) and colored by network communities. Network layout is computed by the FruchtermanReingold force-directed algorithm based on aesthetic criteria (Fruchterman & Reingold 1991). Link strength is not specified and has no relation to the distance of nodes (p. 17)

With a directed network, graph metrics indicating node hierarchy, such as node in-degree, can be computed to reflect customers’ aggregated preferences across the population. (p. 18)

Our illustrative example shows that descriptive network analysis may serve as a useful tool for designers to determine product positioning and product priorities in the phase of design planning. Centrality, community, and hierarchy allow designers to uncover the root causes of the dierences in vehicle sales under a specific market. (p. 18)

there is a need for an approach to quantitatively evaluate customer heterogeneous preferences while addressing issues such as dependent alternatives, multiple decisions, social influence, and correlated observations (p. 18)

4.2. Using MCPN for modeling luxury vehicle preferences in Central China (p. 18)

Our second implementation demonstrates the use of inferential network technique (ERGM) for analyzing the vehicle MCPN framework (Sections 3.3 and 3.4). (p. 18)

The proposed MCPN integrates a feature-driven product association network, a customerproduct network, and a customer social network as a unified entity for analysis. The implementation of the proposed approach goes beyond the descriptive analysis and consists of three major steps: network construction, ERGM specification, and ERGM interpretation; each of these steps is explained in the remainder of this section. (p. 19)

4.2.1. Data transformation & network construction (p. 19) Product associations. (p. 19)

’. In this example, product association links in the product layer of the multidimensional network are constructed using the complete set of attributes considered, including vehicle price, engine capacity, fuel consumption, and the existence of turbo. The association link is viewed as a form of overall product similarity from the perspective of engineering design. (p. 19)

Within the association network construct, the Gower’s coecient (Gower 1971) is calculated to determine the existence of a link between any product pair. Gower’s coecient has the capability to appropriately handle continuous, ordinal, nominal and binary variables as inputs. (p. 19)

Based on the empirical results, a global thinning threshold at 0.05 is chosen (p. 19) Preference relations. We use the between-layer links connecting a product and a customer to model customers’ consideration decisions over vehicle models. The structure of these links is precisely defined by NCBS data. (p. 19) Social relations. (p. 19)

Using the same strategy in our previous work on network simulation (He et al. 2014), a social space is constructed based on customer geographical locations and selected social attributes (age, income, education). (p. 19)

em.. so the links are not strictly ‘social’ but based on customer attributes. (p. 19)

Based on the homophily assumption that two customers with shorter distance in the social space are more likely to be connected (p. 19)

To mimic the properties of real world networks, we then adjust the social links using the small-world model (Watts & Strogatz 1998; Delre et al. 2007) to assure the high transitivity (one's friends are likely to be friends') and low average path length (six degrees of separation’ between any two individuals). (p. 20)

4.2.2. Specification of ERGMs for multidimensional networks (p. 20)

As presented in Table 4, the examined network eects are restricted to a subset of cross-level configurations and product/customer attributes of dierent forms. The choice of which network eect to include depends on the social theory, hypothesis, and the specific research questions to answer. (p. 20)

4.2.3. Comparisons and interpretations of ERGMs (p. 20)

hus, we employ a stochastic approximation (Snijders 2002) that relies on MCMC simulations of graphs. (p. 20)

The results of the ERGM estimates for various specifications are presented in Table 4. The significant coecient estimates are shown in bold font, (p. 20)

Model 1 formulates a bipartite ERGM analogous to a logistic model that contains only the attribute-relation eects composed by attributes of customers and products. (p. 22)

Model 2 parameterizes a bipartite ERGM similar to Model 1 but with the addition of the pure structural eects and the cross-level product association eect. (p. 22)

The specification of Model 3 is the most complete model that includes all three types of ERGM eects. With the integration of the cross-level social influence eect, peer influences on preference decisions can be evaluated together with other product attributes, customer demographics, and structural patterns within the same model. (p. 22)

Two penalized-likelihood criteria  AIC and BIC  are also provided in Table 4. The two measures decrease gradually as the addition of the considered structural eects, suggesting improved fits from Model 1 to Model 3. (p. 22)

The interpretation of Model 1 is similar to that for a logistic model. The vehicle price has a negative significant sign, implies that lower price is preferred in consideration of luxury vehicles. The significant positive turbocharger and engine capacity indicate that the presence of the turbocharger and the increased size of the engine would increase the probability for a customer to consider a particular vehicle model. (p. 22)

The statistically negative first-time buyer suggests that first-time buyers are unlikely to enter the luxury vehicle market (p. 22)

In Model 2, the addition of the pure structural eects and the cross-level customer considers similar products eect considerably changes the interpretation of the underlying preference data. (p. 22)

In contrast, the degree distribution is more centered for customer nodes, as shown by the negative consideration range coecient, because customers only consider a limited number of vehicles (13) in NCBS data. (p. 22)

The coecients of Model 3 are largely consistent with those in Model 2, except that the previously insignificant income becomes significant, while the previously significant engine capacity becomes insignificant. (p. 23)

The significant positive peer influence indicates that a customer is likely to become `irrational’ in decision- making and simply considers what his/her peer has considered. Modeling the peer influence is a unique contribution of our work as such eect cannot be modeled either theoretically or computationally without the MCPN framework. (p. 23)

First, including the attribute-relation main eects alone (Model 1) can explain a large part of the formation of preference links. (p. 23)

Finally, the peer influence eect (Model 3) introduces another layer of dependencies between two customers into the structure of the network. The significant positive estimate reflects the importance of social influence in explaining customer behavior and modeling product demand. Overall, the results of this example suggest that the nodal attributes (representing customer and product attributes) and network structures (representing product associations, social influences, and other underlying eects) are indispensable elements playing together in shaping the preferences of customers. (p. 23)

  1. Discussion and conclusion (p. 23)

Though powerful and flexible, MNA has certain characteristics that need careful attentions when implemented for preference modeling (p. 24)

Depending on the purpose of the analysis, the size of a network model can vary from a few nodes to hundreds or thousands of nodes containing a diverse set of products. However, the network model could be sensitive to the issue of missing data and influenced by how links are defined (Kossinets & Watts 2006). In addition, for a poorly specified model, degeneracy may occur in model estimation and cause the Markov chain to move towards an extreme graph of all or no edges (Snijders 2002). This issue can be solved by incorporating curved exponential family terms that exhibit more stable behavior in model construction (Snijders et al. 2006). (p. 25)

important analytical considerations (p. 25)