Assume we have multiple data frames say df1,df2,df3,... What is the most efficient R way to count the number of rows that are identical across multiple data frames. Nested multiple loops is not the answer, right?
Thanks
Assume we have multiple data frames say df1,df2,df3,... What is the most efficient R way to count the number of rows that are identical across multiple data frames. Nested multiple loops is not the answer, right?
Thanks
df1=data.frame(A=11:13,B=111:113)
df2=data.frame(A=22:24,B=222:224)
df3=data.frame(A=c(33:35,11),B=c(333:335,111))
if you are happy to bind the data.frame manually:
> df = rbind(df1,df2,df3)
(otherwise you can also use):
> df = do.call(what=rbind,args=mget(paste("df",1:3,sep=""))))
Then
> library(plyr)
> ddply(.data=df,.variables=colnames(df),.fun=nrow)
Where the 3rd column is the number of times each row is repeated
A bit hacky, but should work:
df1$comp <- paste(df1$V1,df1$V2,df1$V3,..., df1$Vn, sep="")
df2$comp <- paste(df2$V1,df2$V2,df2$V3,..., df2$Vn, sep="")
etc
Then
# Number of complete rows in df1 that are in df2.
summary(df1$comp %in% df2$comp)