0

I am working with employee history of members of the financial industry, and would like to make an edgelist to visualize it in a Sankey Flow. So far, my data is in strings of comma-separated entities, like this:

A, B, D
C, A, E, B
F, B

etc.

Of particular interest is ONE of these companies (call it Company B for example). I need to turn these data above into something resembling this:

A, B
B, D
C, B
A, B
E, B
F, B

etc.

Again, the importance is on company B, so I need a way to discern on that factor specifically, and deal with strings of varying length. In the end, I need an edgelist in which every row has Company B, with the data taken from those companies surrounding Company B in the comma-separated strings.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • 1
    Welcome to SO! What have you tried that has not worked? Please see [how to make a great reproducible question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) as well. – hrbrmstr Oct 18 '15 at 14:06

1 Answers1

0

There are several ways to do that in R. Here's one way to do that in base R:

myc <-c("A,B,D","C,A,E,B","F,B")
myc <-strsplit(myc,",") #split value on comma

res <-lapply(myc,combn,2,simplify = FALSE) #create cominations
out <-matrix(unlist(res),ncol=2,byrow=TRUE) #create dataframe of combinations
out[colSums(apply(out,1,match,"B"),na.rm=TRUE)==1,] #keep only combinations with "B"
     [,1] [,2]
[1,] "A"  "B" 
[2,] "B"  "D" 
[3,] "C"  "B" 
[4,] "A"  "B" 
[5,] "E"  "B" 
[6,] "F"  "B" 
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
  • Here is a slghtly shorter variation: `s <- unlist(lapply(myc, function(x) grep("B", combn(x, 2, toString), value = TRUE))); read.table(text = s, sep = ",", as.is = TRUE)` . If the output is wanted in the form of comma separated strings then omit the `read.table` line. – G. Grothendieck Oct 18 '15 at 17:15