Social Network Analysis in R

Alison Link
19 November 2015

Problem

New workplace, large college, no idea of who knows whom.

Solution…some kind of network-y “Rolodex”?

Social network analysis "crash course"

Nodes & edges
Directed vs. undirected
Measures of centrality
- closeness
- degree centrality
- betweenness

Thanks, Coursera!

The packages

library(igraph)
library(networkD3) # the library formerly known as "d3Network"

Our raw data

Two columns of people: 1) a team member 2) a “Rolodex” contact

Additional columns with attributes:

some relating to edges (strength of relationship)
some relating to nodes (department affiliation, notes)

Load the data

Read in the .CSV…

raw_csv_data <- read.csv("TEL Network Map - Spring 2015.csv", header=TRUE)

names(raw_csv_data) <- c('TELmember', 'CLAcontact', 'dept', 'context', 'notes', 'strength')

#head(raw_csv_data)

Approach #1: iGraph

library(igraph)

Prep the data

We're in luck! The iGraph package's “graph_from_data_frame” takes as an argument…

“A data frame containing a symbolic edge list in the first two columns. Additional columns are considered as edge attributes. Since version 0.7 this argument is coerced to a data frame with as.data.frame.”

So, let's convert to an iGraph-friendly data format…

network_graph_data <- graph_from_data_frame(raw_csv_data, directed=FALSE)

Note: This is the simplest way to coerce our data into an iGraph-friendly format. It will treat all columns as edge attributes–something we may or may not want. For a more complicated version that supports vertex attributes, as well, look at the graph.data.frame command.

Inspect the data

Look at the vertices…

V(network_graph_data)

Look at the edges…

E(network_graph_data)

Plot the data

plot(network_graph_data)

plot of chunk unnamed-chunk-7

Or, for a sweet '90s-looking X-windowed interactive version:

tkplot(network_graph_data)

Style the plot

library(car) # use this for its "recode" function

Vertex styling

list.vertex.attributes(network_graph_data)

[1] "name"

V(network_graph_data)$color <- recode(V(network_graph_data)$name, "'alink' = 'maroon'; else= 'yellow'")

Edge styling

list.edge.attributes(network_graph_data)

[1] "dept"     "context"  "notes"    "strength"

E(network_graph_data)$weight <- recode(E(network_graph_data)$strength, "'1' = 1; '2' = 3; '3' = 7; else = 1")

Apply the styling

plot.igraph(network_graph_data, layout = layout.fruchterman.reingold,
     main="Our Team Network - Spring 2015", 
     vertex.color = V(network_graph_data)$color, 
     vertex.frame.color = "grey",
     vertex.label = V(network_graph_data)$name,
     vertex.label.color = "black",
     vertex.label.cex = 0.7,
     vertex.label.family="sans",
     edge.width = E(network_graph_data)$weight)

plot of chunk unnamed-chunk-12

Approach #2: networkD3

library(networkD3) # the library formerly known as "d3Network"

Plot the data

simpleNetwork(raw_csv_data)

But I like numbers. Where are the numbers?

Closeness

Answers the “Kevin Bacon” question:

“How many steps are required to access every other vertex from a given vertex?”

One practical implication of this metric: it helps you gauge how information might spread within your network, and who might be the best people to leverage if you need to make sure information gets around.

closeness_stats <- closeness(network_graph_data, vids = V(network_graph_data), mode = "all", weights = NULL, normalized = FALSE)

closeness_stats[3:4]

      alink    dwoldeab 
0.001156069 0.001964637

Closeness is the the inverse of the sum of the shortest distances between each node and every other node in the network.

1/closeness_stats[3:4]

   alink dwoldeab 
     865      509

Degree Centrality (Bonacich's power centrality)

Each node's centrality score depends both on how many connections it has and how many connections its connections have.

“It doesn't only matter how many friends you have. It matters how many friends your friends have.”

This also leads to some philosophical questions about power…

Are you more powerful if you're connected to powerful people? (positive attenuation factor)

Or are you more powerful if you're connected to weak, dependent people? (negative attenuation factor)

power_centrality_stats <- power_centrality(network_graph_data, nodes = V(network_graph_data), loops = FALSE, exponent = 1, rescale = FALSE, tol = 1e-07, sparse = TRUE)

power_centrality_stats[3:4]

    alink  dwoldeab 
-1.631313 -1.770684

Betweenness

Answers the “telephone” question:

“If this network were a game of telephone, who would have the greatest potential to mess up the game?”

People (aka “edges”) with high betweenness are the “social brokers” in a network.

betweenness_stats <- betweenness(network_graph_data, v = V(network_graph_data), directed = FALSE, weights = NULL, nobigint = TRUE, normalized = FALSE)

betweenness_stats[3:4]

   alink dwoldeab 
1908.700 4942.867

Additional resources

Adamic, L. “Social network analysis”. Coursera. https://www.coursera.org/course/sna
Csardi, G. “Practical statistical network analysis”. http://statmath.wu.ac.at/research/friday/resources_WS0708_SS08/igraph.pdf
Hanneman, R & Riddle, M. “Introduction to social network methods”. http://faculty.ucr.edu/~hanneman/nettext/C10_Centrality.html
iGraph R package documentation. http://igraph.org/r/doc/
Statistical Analysis of Network Data (Springer)