1

I am working on a directed graph and need some advice on generating a particular edge attribute.

I need to use both the count of interactions as well as another quality of the interaction (the average length of text used within interactions between the same unique from/to pair) in my visualization.

I am struggling to figure out how to create this output in a clean, scalable way. Below is my current input, solution, and output. I have also included an ideal output along with some things I have tried.

Input

x = read.table(network = "
Actor Receiver Length
1       1       4       
1       2       20      
1       3       9      
1       3       100
1       3       15    
2       3       38
3       1       25
3       1       17"
sep = "", header = TRUE)

I am currently using dplyr to get a count of how many times each pair appears to achieve the output below.

I use the following command:

EDGE <- dplyr::count(network, Actor, Receiver )
names(EDGE) <- c("from","to","count")

To achieve my current output:

From    To Count 
1       1    1       
1       2    1      
1       3    3          
2       3    1
3       1    2

Ideally, however, I like to know the average lengths for each pair as well, or end up with something like this:

From    To Count AverageLength
1       1    1         4 
1       2    1         20
1       3    3         41
2       3    1         38
3       1    2         21

Is there any way I can do this without creating a host of new data frames and then grafting them back onto the output? I am mostly having issues trying to summarize and count at the same time. My stupid solution has been to simply add "Length" as an argument to the count function, this does not produce anything useful. I could also that it may be useful to combine actor-receiver and then use the summary function to create something to graft onto the frame as a result of the count. In the interest of scaling, however, I would like to figure out if there is a simple and clear way of doing this.

Thank you very much for any assistance with this issue.

EVie
  • 23
  • 2
  • Check out `group_by` and `summarise` (which collapses groups, calculating whatever summary variables you tell it to), which are part of the core of dplyr: `network %>% group_by(Actor, Receiver) %>% summarise(Count = n(), AverageLength = mean(Length))`. Also, you have some typos in your input data (missing comma, name where you need `"text"`). – alistaire Jul 12 '16 at 20:25
  • @alistaire, this is quite more elegant than my answer below, maybe post it as an answer - anyway, don't forget to mention to use `library(magrittr)` – dof1985 Jul 12 '16 at 20:55
  • Or just `aggregate(Length ~., x, mean)` – David Arenburg Jul 12 '16 at 21:02
  • 1
    @dof1985 dplyr (as with most tidyverse packages) already imports the pipe from magrittr, so you don't need to load that too. – alistaire Jul 12 '16 at 21:12
  • @alistaire, but you need to load `dplyr`, rather using `dplyr::` – dof1985 Jul 13 '16 at 06:37

1 Answers1

1

A naive solution would be to use cbind() in order to connect these two outputs together. Here is an example code:

Actor    <- c(rep(1, 5), 2, 3, 3)
Receiver <- c(1, 2, rep(3, 4), 1, 1)
Length   <- c(4, 20, 9, 100, 15, 38, 25, 17)

x <- data.frame("Actor" = Actor,
                "Receiver" = Receiver,
                "Length" = Length)

library(plyr)

EDGE <- cbind(ddply(x,.(Actor, Receiver), nrow), # This part replace dplyr::count
ddply(x,.(Actor, Receiver), summarize, mean(Length))[ , 3]) # This is the summarize
names(EDGE) <- c("From", "To", "Count", "AverageLength")

EDGE   # Gives the expected results
  From To Count AverageLength
1    1  1     1       4.00000
2    1  2     1      20.00000
3    1  3     3      41.33333
4    2  3     1      38.00000
5    3  1     2      21.00000
dof1985
  • 152
  • 1
  • 8