read

References

Citekey: @Jiang2013

Jiang, F., Wang, J., Hindle, A., & Nascimento, M. A. (2013). Mining the Temporal Evolution of the Android Bug Reporting Community via Sliding Windows. CoRR, abs/1310.7. Retrieved from http://arxiv.org/abs/1310.7469

Notes

This paper reports a study of applying an interesting sliding-windows social network analysis on Android bug reporting community. This technique shows clear advantage in comparison with global analysis of social networks in that it takes temporal information into consideration. It is especially useful for studying interactions that expand over an extended period of time.

For user clustering specifically, this study applies K-means clustering on betweenness centrality measures across sliding windows. This is radically different from clustering based on interaction structure in a global analysis of social network. The clustering used in this paper has great value. (In addition, this study normalized the betweenness centrality measure so that it will not be affected by the size of networks; when comparing sums of centrality measure across time, note that the total number of participants stay the same. So comparisons made in this study are quite meaningful to me.)

Highlights

Unfortunately, for tools like Social Network Analysis (SNA), a global analysis can miss a lot of important interactions (p. 1)

In order to apply SNA to the bug repository, we first create the graph based on the interactions. We pose that each node of the graph represents one bug participant and each edge represents the connection between two participants who have communicated on the same bug. (p. 2)

We use betweenness centrality to quantify the importance of a participant in the community [13] (p. 2)

When participants have high betweenness, they might have: 1) reported quantities of bugs with at least one comment on them, 2) made lots of comments, 3) reported a very critical bug which attracts comments, 4) or made a very interesting comment which attracts comments from other participants. (p. 2)

In order to peer into these local self organized structures using social network analysis, we felt it is better to choose a windowed approach, [3, 14]. Windowing allows us to look at network during a slice of time and then relate our measures (betweenness centrality per author) to the next window and beyond. This sliding window view of centrality allows us to see those developers and users who are constantly at the forefront of discussion or those who ebb and flow between issues and tasks. Moreover, by sliding windows, each pair of adjacent windows would have an overlap, which results in smoother trends, and more importantly, helps to maintain context. Other benefits provided by time windowed analysis is that it gives a more accurate and nuanced view of the data as locally central participants then will not be “drowned out”. (p. 2)

Global research questions: RQ1. How does the number of active bug participants change over time? Why? RQ2. How does the betweenness centrality of a participant change over time? What are the reasons when they have a certain activity pattern? Local research questions: RQ3. Are there special time ranges during which participants are more/less active or central than normal? Why? RQ4. What are the possible scenarios for a very sharp change of the participants’ centrality? Why? (p. 2)

2 Background (p. 3)

2.1 Betweenness Centrality (p. 3)

In our work, we normalize the betweenness centrality values to eliminate the effect of different sizes of the networks. The betweenness is normalized as: (p. 3)

Compared with simply counting the total number of comments or total bug reports of a participant, betweenness acts better to reflect the interactions among people. For example, when a person reports lots of bugs but none of them attract any comment, it is very likely that his bug reports are not interesting or important. (p. 4)

2.2 Overlapping Time Windowing (p. 4)

Moreover, time windowed analysis could give a more accurate and nuanced view of the data [3, 14], as locally central participants would not be “drowned out”. (p. 4)

Another point is that, comments on the same bug might not be globally temporally relevant [17, 18] thus a global time analysis would not make much sense in this case. (p. 4)

Figure 2: Betweenness centrality along time line: the x-axis represents the number of time windows and the starting dates are denoted every 100 windows. The y-axis represents the number of bug participants who have ever been central in the bug community with betweenness centrality valued greater than 0 for some time period. The color represents the value of betweenness centrality, with darker colors corresponding to lower betweenness and lighter colors for higher betweenness. We used K-means clustering with cosine distance where K = 100 (p. 5)

2.3 Clustering (p. 5)

Figure 3(1) orders participants by their betweenness values. We can indeed find that there are participants of low overall betweenness but being very active (show up bright) (p. 5)

In order to perceive clusters, that are local groups of interactions, we clustered the bug participants using K-means by their betweenness centrality distribution along the time line. K-means is one of the most popular clustering methods which aims to partition n data items into k clusters that each data item belongs to the cluster with the nearest mean [19]. (p. 5)

In this case, cosine distance calculates the similarity between each pair of participants in terms of temporal centrality whereas Euclidean distance focuses on the magnitude of data, the size and frequency of centrality. (p. 6)

3 Methodology (p. 6)

3.1 Data (p. 6)

Step 1: Pruning the data. (p. 7)

Step 2: Windowing the records. We windowed the data into periods of 30 days with a 29-day overlap. 30 days was chosen as a window size because it is smaller than the periods between a major and minor release, it is similar to a month of work, but long enough to contain the resolution of multiple bugs. We have compared sliding by 1 day with our previous result of sliding by 7 days, 1 day sliding produces gradual and smoother transitions of centrality. (p. 7)

Step 3: Establishing the network. We made a tool to perform the SNA with sliding windows. The tool is implemented in Java and built on top of the JUNG Graph Framework, that converted bug reports and bug comment records within a window to a social network graph. (p. 7)

Step 4: Calculating the centrality. (p. 7)

Step 5: Removing irrelevant participants. We removed the participants with betweenness centrality value 0 (p. 7)

Step 6: Generating the analysis graph. (p. 7)

3.3 Validation using the Android Release History and the Git (p. 8)

4 Results and Analysis (p. 8)

RQ1. How does the number of active bug participants change over time? Why? To give an overview, we compared the interaction of bug participants between January, 2010 and December, 2011, and found that the interaction among participants in the Android bug community in 2011 was similar to the interaction of participants in 2010 but more frequent, as we can see in Figure 2. One “gap” occurs around window 300, which we will explain in the next Local Analysis subsection. (p. 8)

RQ2. How does the betweenness centrality of a participant change over time? What are the reasons when they have a certain activity pattern? Observing the continuity of betweenness centrality in Figure 2, some participants have kept active during the entire two years, and correspondingly they have a very continuous and bright line. (p. 10)

However, in most cases, participants’ betweenness values are highly variant, as observed in Figure 2. To investigate the variation in betweenness values over time, we decided to count the number of times that a user experienced a range of consecutive (p. 10)

windows in which the user had non-zero betweenness. Participants with a count of distinct ranges greater than 1 would be phasers who periodically participate within the Android bug community. Here phasers are those who phase into centrality and later out of it. (p. 11)

among the 1654 participants with betweenness values larger than 0, we analyzed their centrality patterns and divide them into three categories: 1) participants appeared only once with a betweeness greater than 0 (71 out of 1654 participants), 2) participants recurred periodically (1575 participants) and 3) participants who are central along the entire project history (8 participants). (p. 11)

RQ4. What are the possible scenarios for a very sharp change of the participants’ centrality? Why? (p. 11)

almost all of them has experienced centrality oscillations. In addition, some participants tend to become active and core members during the same time period and then they fade away together. (p. 11)

5 Validation (p. 12)

5.1 Activity pattern validation (p. 12)

we notice that there is a small group of participants who have been central for most of our analysis period (8 participants out of the 1654); another relatively larger group appear without any recurrence (71 participants out of the 1654); the majority would periodically become central in their community (1575 participants out of the 1654). (p. 12)

5.1.1 Participants that appeared only once tend to be pure users We look into the git repository to find the files submitted by the 71 participants who have appeared only once in the bug community. (p. 12)

5.1.2 Participants showed up periodically should be a combination of users and developers Periodically appearing participants are the majority and we call them phasers. (p. 12)

5.1.3 Participants who were continuously central for a long time period could have multiple areas of expertise 5 out of the 8 participants in this group have submissions in the git. (p. 13)

5.2 Cluster validation (p. 15)

5.2.1 Participants clustered together share similar areas of expertise and tasks (p. 16)

5.2.2 Clusters’ working areas of expertise are in accordance with the release contents along the time line (p. 16)

Blog Logo

Bodong Chen


Published

Image

Crisscross Landscapes

Bodong Chen, University of Minnesota

Back to Home