0

Recently, I wanted to count the number of occurence of a relationship between 2 variables whatever the order of the relation is. In other words,

Variable1 Variable2
A          B
B          A

I should get

Variable1 Variable2 Weight 
A          B        2

I could tried something like

result <- aggregate(var_to_sum, by=list(var1,var2), FUN=sum, na.rm=TRUE)

However I dont know how to make A B equals to B A

Author1 <- ifelse(var1>var2,var1,var2)

But it does not work, any idea how to solve this problem?

richpiana
  • 411
  • 1
  • 7
  • 16
  • 1
    The stata code is not useful. Instead, describe what you want to do and add some example data. See [here](http://stackoverflow.com/q/5963269/4303162) for more information on how to do that in a good way. – Stibu Mar 27 '16 at 21:24
  • Sorry for that. Imagine that I have two variables var1 and var2, i want to count the number of relationships between var1 and var2. So in my example if var1 equals A and then B and var2=B and then A so i should only have one relationship with A B and a weight of 2 – richpiana Mar 27 '16 at 21:28
  • I understand that. But you should add example data to your question that potential answerers can work with. – Stibu Mar 27 '16 at 21:29
  • I tried to edit my question – richpiana Mar 27 '16 at 21:32
  • Can you sort the rows by any chance? If yes, I have an idea using plyr's count function. By sorting rows, make sure that A always comes before I, B comes before C, etc. – InfiniteFlash Mar 27 '16 at 21:46

3 Answers3

3

Here is an approach with data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)). For values in "V1" that is greater than "V2", we assign "V2" as "V1" and viceversa. Grouped by "V1" and "V2", we get the nrow (.N) i.e. "weight".

library(data.table)
setDT(df1)[V1>V2, c("V2", "V1") := list(V1, V2)][, 
                            .(weight=.N) ,.(V1, V2)]
#   V1 V2 weight
#1:  A  B      2
#2:  C  D      2

data

df1 <- data.frame(V1=c("A", "B", "C", "D"), 
              V2=c("B", "A", "D", "C"), stringsAsFactors = FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Since you do not provide any sample data, I'll use this data set:

set.seed(1234)
data <- data.frame(var1 = sample(LETTERS[1:4], 30, replace = TRUE),
                   var2 = sample(LETTERS[1:4], 30, replace = TRUE),
                   stringsAsFactors = FALSE)

data is a data frame with two columns and all entries are "A", "B", "C", or "D".

First, I create a column that contains both variables in standard order and then count:

data <- transform(data, comb = ifelse(var1 < var2,
                                      paste(var1, var2, sep = ":"),
                                      paste(var2, var1, sep = ":")))
counts <- as.data.frame(table(data$comb))
counts
##   Var1 Freq
## 1  A:A    3
## 2  A:B    3
## 3  A:C    2
## 4  A:D    4
## 5  B:B    4
## 6  B:C    4
## 7  C:C    3
## 8  C:D    6
## 9  D:D    1

If you want to have an extra column for each of the variables again, you could for instance use the function separate() from the tidyr package:

counts_split <- tidyr::separate(counts, Var1, c("var1", "var2"), sep = ":")
counts_split
##   var1 var2 Freq
## 1    A    A    3
## 2    A    B    3
## 3    A    C    2
## 4    A    D    4
## 5    B    B    4
## 6    B    C    4
## 7    C    C    3
## 8    C    D    6
## 9    D    D    1
Stibu
  • 15,166
  • 6
  • 57
  • 71
1

A simple approach. You can use the pmin and pmax functions.

df1 <- data.frame(V1=c("A", "B", "C", "D"), 
                  V2=c("B", "A", "D", "C"), stringsAsFactors = FALSE)

library(dplyr)

df1 %>% transmute(V1n=pmin(V1, V2), V2n=pmax(V1, V2)) %>% 
  group_by(V1n, V2n) %>% summarise(weight=n())
# Source: local data frame [2 x 3]
# Groups: V1n [?]
# 
#     V1n   V2n weight
#   (chr) (chr)  (int)
# 1     A     B      2
# 2     C     D      2
Ven Yao
  • 3,680
  • 2
  • 27
  • 42