Count number of relations between 2 variables

Question

Recently, I wanted to count the number of occurence of a relationship between 2 variables whatever the order of the relation is. In other words,

Variable1 Variable2
A          B
B          A

I should get

Variable1 Variable2 Weight 
A          B        2

I could tried something like

result <- aggregate(var_to_sum, by=list(var1,var2), FUN=sum, na.rm=TRUE)

However I dont know how to make A B equals to B A

Author1 <- ifelse(var1>var2,var1,var2)

But it does not work, any idea how to solve this problem?

The stata code is not useful. Instead, describe what you want to do and add some example data. See [here](http://stackoverflow.com/q/5963269/4303162) for more information on how to do that in a good way. — Stibu, Mar 27 '16 at 21:24
Sorry for that. Imagine that I have two variables var1 and var2, i want to count the number of relationships between var1 and var2. So in my example if var1 equals A and then B and var2=B and then A so i should only have one relationship with A B and a weight of 2 — richpiana, Mar 27 '16 at 21:28
I understand that. But you should add example data to your question that potential answerers can work with. — Stibu, Mar 27 '16 at 21:29
Can you sort the rows by any chance? If yes, I have an idea using plyr's count function. By sorting rows, make sure that A always comes before I, B comes before C, etc. — InfiniteFlash, Mar 27 '16 at 21:46

score 3 · Answer 1 · answered Mar 28 '16 at 02:53

Here is an approach with data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)). For values in "V1" that is greater than "V2", we assign "V2" as "V1" and viceversa. Grouped by "V1" and "V2", we get the nrow (.N) i.e. "weight".

library(data.table)
setDT(df1)[V1>V2, c("V2", "V1") := list(V1, V2)][, 
                            .(weight=.N) ,.(V1, V2)]
#   V1 V2 weight
#1:  A  B      2
#2:  C  D      2

data

df1 <- data.frame(V1=c("A", "B", "C", "D"), 
              V2=c("B", "A", "D", "C"), stringsAsFactors = FALSE)

Stibu · Accepted Answer · 2016-03-28T06:42:52.940

Since you do not provide any sample data, I'll use this data set:

set.seed(1234)
data <- data.frame(var1 = sample(LETTERS[1:4], 30, replace = TRUE),
                   var2 = sample(LETTERS[1:4], 30, replace = TRUE),
                   stringsAsFactors = FALSE)

data is a data frame with two columns and all entries are "A", "B", "C", or "D".

First, I create a column that contains both variables in standard order and then count:

data <- transform(data, comb = ifelse(var1 < var2,
                                      paste(var1, var2, sep = ":"),
                                      paste(var2, var1, sep = ":")))
counts <- as.data.frame(table(data$comb))
counts
##   Var1 Freq
## 1  A:A    3
## 2  A:B    3
## 3  A:C    2
## 4  A:D    4
## 5  B:B    4
## 6  B:C    4
## 7  C:C    3
## 8  C:D    6
## 9  D:D    1

If you want to have an extra column for each of the variables again, you could for instance use the function separate() from the tidyr package:

counts_split <- tidyr::separate(counts, Var1, c("var1", "var2"), sep = ":")
counts_split
##   var1 var2 Freq
## 1    A    A    3
## 2    A    B    3
## 3    A    C    2
## 4    A    D    4
## 5    B    B    4
## 6    B    C    4
## 7    C    C    3
## 8    C    D    6
## 9    D    D    1

Thanks for your great answer. Sorry my question was not very clear. But the way you described and explained me is great. I wish you a nice evening — richpiana, Mar 27 '16 at 21:54
@branchwarren What your question lacked was some exampke data as I provide in the first code snippet of my answer. — Stibu, Mar 27 '16 at 21:56
I will take your response as an example for my future questions — richpiana, Mar 27 '16 at 21:57

score 1 · Answer 3 · answered Mar 28 '16 at 02:26

A simple approach. You can use the pmin and pmax functions.

df1 <- data.frame(V1=c("A", "B", "C", "D"), 
                  V2=c("B", "A", "D", "C"), stringsAsFactors = FALSE)

library(dplyr)

df1 %>% transmute(V1n=pmin(V1, V2), V2n=pmax(V1, V2)) %>% 
  group_by(V1n, V2n) %>% summarise(weight=n())
# Source: local data frame [2 x 3]
# Groups: V1n [?]
# 
#     V1n   V2n weight
#   (chr) (chr)  (int)
# 1     A     B      2
# 2     C     D      2

Count number of relations between 2 variables

3 Answers3

data