3

I have a data frame that's of this structure:

df <- data.frame(var1 = c(1,1,1,2,2,3,3,3,3),
                 cat1 = c("A","B","D","B","C","D","E","B","A"))`

> df
  var1 cat1
1    1    A
2    1    B
3    1    D
4    2    B
5    2    C
6    3    D
7    3    E
8    3    B
9    3    A

And I am looking to create both nodes and edges data frames from it, so that I can draw a network graph, using VisNetwork. This network will show the number/strength of connections between the different cat1 values, as grouped by the var1 value.

I have the nodes data frame sorted:

nodes <- data.frame(id = unique(df$cat1))
> nodes
  id
1  A
2  B
3  D
4  C
5  E

What I'd like help with is how to process df in the following manner: for each distinct value of var1 in df, tally up the group of nodes that are common to that value of var1 to give an edges dataframe that ultimately looks like the one below. Note that I'm not bothered about the direction of flow along the edges. Just that they are connected is all I need.

> edges
  from to value
1    A  B     2
2    A  D     2
3    A  E     1
4    B  C     1
5    B  D     2
6    B  E     1
7    D  E     1

With thanks in anticipation, Nevil

Update: I found here a similar problem, and have adapted that code to give, which is getting close to what I want, but not quite there...

    > df %>% group_by(var1) %>%
             filter(n()>=2) %>% group_by(var1) %>%
             do(data.frame(t(combn(.$cat1, 2,function(x) sort(x))), 
                           stringsAsFactors=FALSE))

# A tibble: 10 x 3
# Groups:   var1 [3]
    var1 X1    X2   
   <dbl> <chr> <chr>
 1    1. A     B    
 2    1. A     D    
 3    1. B     D    
 4    2. B     C    
 5    3. D     E    
 6    3. B     D    
 7    3. A     D    
 8    3. B     E    
 9    3. A     E    
10    3. A     B  
CJ Yetman
  • 8,373
  • 2
  • 24
  • 56
Nevil
  • 161
  • 1
  • 11
  • Hello Sorif. I thought I had shown the 'edges' dataframe that I am seeking to generate from the original df. Am I missing something? – Nevil Apr 07 '18 at 12:37

2 Answers2

2

I don't know if there is already a suitable function to achieve this task. Here is a detailed procedure to do it. Whith this, you should be able to define you own function. Hope it helps!

# create an adjacency matrix
mat <- table(df)
mat <- t(mat) %*% mat 
as.table(mat) # look at your adjacency matrix
# since the network is not directed, we can consider only the (strictly) upper triangular matrix 
mat[lower.tri(mat, diag = TRUE)] <- 0
as.table(mat) # look at the new adjacency matrix

library(dplyr)
edges <- as.data.frame(as.table(mat))
edges <- filter(edges, Freq != 0)
colnames(edges) <- c("from", "to", "value")
edges <- arrange(edges, from)
edges # output

#  from to value
#1    A  B     2
#2    A  D     2
#3    A  E     1
#4    B  C     1
#5    B  D     2
#6    B  E     1
#7    D  E     1
nghauran
  • 6,648
  • 2
  • 20
  • 29
2

here's a couple other ways...

in base R...

values <- unique(df$var1[duplicated(df$var1)])

do.call(rbind,
  lapply(values, function(i) {
    nodes <- as.character(df$cat1[df$var1 == i])
    edges <- combn(nodes, 2)
    data.frame(from = edges[1, ],
               to = edges[2, ],
               value = i,
               stringsAsFactors = F)
  })
)

in tidyverse...

library(dplyr)
library(tidyr)

df %>%
  group_by(var1) %>%
  filter(n() >= 2) %>%
  mutate(cat1 = as.character(cat1)) %>% 
  summarise(edges = list(data.frame(t(combn(cat1, 2)), stringsAsFactors = F))) %>%
  unnest(edges) %>% 
  select(from = X1, to = X2, value = var1)

in tidyverse using tidyr::complete...

library(dplyr)
library(tidyr)

df %>%
  group_by(var1) %>%
  mutate(cat1 = as.character(cat1)) %>% 
  mutate(i.cat1 = cat1) %>% 
  complete(cat1, i.cat1) %>% 
  filter(cat1 < i.cat1) %>% 
  select(from = cat1, to = i.cat1, value = var1)

in tidyverse using tidyr::expand...

library(dplyr)
library(tidyr)

df %>%
  group_by(var1) %>%
  mutate(cat1 = as.character(cat1)) %>%
  expand(cat1, to = cat1) %>% 
  filter(cat1 < to) %>% 
  select(from = cat1, to, value = var1)
CJ Yetman
  • 8,373
  • 2
  • 24
  • 56