I have 103 data frames with 7 variables and more than 1000 rows. I want to find the number of occurrences of a pair of two columns of one data frame in other 102 data frames. In other words, how many times c(V1,V2)
together (=two columns of a data frame together) can be seen in other 102 data frames.
I've already written a code, but it is very slow!
I put all 103 data frames in a list and convert it to a data frame. Then make a for loop to read each data frame one by one. and in each loop i have another for loop to search for each row of the data frame in that list!
The main part of the codes is as follows:
for(i in file){
input<-read.table(i)
for(j in 1:1000){
df1<- data.table(input[j,c(1,3)])
count<-merge(df1,dt, c("V1", "V3")) //dt is a data frame includes all 103 data frames
df1["count"]<-nrow(count)
}
}
In this way, I can count how many times set of V1 and V3 of a data frame, comes in other data frames. But obtaining the whole results needs more than 50 days!
I wonder if anyone can help me with a faster way to obtain my desired results.
Example of the data frames (just 5 variable are considered here):
V1 V2 V3 V4 V5
1 Q0 abc 34 3
1 Q0 abd 31 9
1 Q0 bac 32 3
1 Q0 cba 56 0
2 Q0 zxc 37 3
2 Q0 fgc 30 3
2 Q0 ghc 36 3
In fact, I want to find out how many times each value of V3 comes in other data frames but because V3 and V1 are dependent. I must consider V1 in my search as well. So, I have to see how many times c(V1,V3)
comes in other data frames. For example (1,abc) together! or (1, abd).
dt has the same structure as the data frames but it includes all data from all data frames that I have!