0

Assuming a dataframe as set out below:

A<-c("John","John","James","Brad")
B<-c("Deb","Deb","Henry","Suzie")
C<-c("Barry","Beth","Deb","Louise")
D<-c("Ben","Dory","John","Simon")
df<-data.frame(A,B,C,D)
df
      A     B      C     D
1  John   Deb  Barry   Ben
2  John   Deb   Beth  Dory
3 James Henry    Deb  John
4  Brad Suzie Louise Simon

How does one go about generating a frequency table showing the total number of times the combination of values in column A & B are found in the same row. The output for this would look like the following.

       A      B     n
1   Brad  Suzie     1
2  James  Henry     1
3   John    Deb     3

I'm aware of simple frequency tables using dplyr but I'm unable to get it to work in this scenario.

lebelinoz
  • 4,890
  • 10
  • 33
  • 56
Morts81
  • 419
  • 3
  • 13
  • Did you meant `library(dplyr);count(rbind(df[1:2], intersect(df[1:2], setNames(df[4:3], names(df)[1:2]))), A, B)` Related to [this](https://stackoverflow.com/questions/9809166/count-number-of-rows-within-each-group) – akrun Sep 19 '17 at 03:48
  • John Deb combination is 2? not 3 because you want only 2 columns right? – Hardik Gupta Sep 19 '17 at 04:29
  • I want to summarise the whole table based on the info in the first two columns. So the combination of John and Deb exists on three rows. – Morts81 Sep 19 '17 at 04:31
  • I am assuming that first form combination of all unique names present in column A and B and then check for number of rows where this pair is present across all columns, irrespective of sequence – Hardik Gupta Sep 19 '17 at 04:37
  • This is a method I've been playing around with too. But I'm struggling to work out how to count rows based on multiple criteria which may exist in different columns. – Morts81 Sep 19 '17 at 04:39

1 Answers1

0
df<-data.frame(A = c("John","John","James","Brad"),
               B = c("Deb","Deb","Henry","Suzie"),
               C = c("Barry","Beth","Deb","Louise"),
               D = c("Ben","Dory","John","Simon"), stringsAsFactors = F)

df$seq <- paste(df$A, df$B, df$C, df$D, sep = ",")

names <- unique(c(df$A,df$B))
pairs <- combn(names, 2)
finaldf <- data.frame(name1 = NULL, name2 = NULL, count = NULL)

for(i in 1:ncol(pairs)){
  name1 <- pairs[1,i]
  name2 <- pairs[2,i]
  count <- length(which( grepl(name1,df$seq) & grepl(name2,df$seq) ))

  finaldf <- rbind(finaldf, data.frame(name1 = name1, name2 = name2, count = count))

}

finaldf

> finaldf
name1 name2 count
1   John James     1
2   John  Brad     0
3   John   Deb     3
4   John Henry     1
5   John Suzie     0
6  James  Brad     0
7  James   Deb     1
8  James Henry     1
9  James Suzie     0
10  Brad   Deb     0
11  Brad Henry     0
12  Brad Suzie     1
13   Deb Henry     1
14   Deb Suzie     0
15 Henry Suzie     0
Hardik Gupta
  • 4,700
  • 9
  • 41
  • 83