1

I am new to R but have a problem that I can't seem to find anywhere online.

I want to compare 3 data frames and find out if they have the same exact data. If they do not, I want the output the tell me which rows has mismatched data. The closest I found was using two data sets, merging the two together, and then look for duplicate rows using the dupsBetweenGroups function but I could not find an answer for doing three simultaneous data frames..

Here is an example of the data I want compared

DataFrameA:

    Date    Time    pH
1   10/8    600     3.85
2   10/9    800     4.05
3   10/10   1300    3.95

DataFrameB:

     Date    Time    pH
1    10/8    600     3.85
2    10/12   900     4.05
3    10/10   1300    3.95

DataFrameC:

     Date    Time    pH
1    10/8    600     8.85
2    10/9    800     4.05
3    10/10   1300    3.95

If the output could return True or False depending on if a row for A, B, and C are the same, that would be perfect.

Any pointers on where to start or any good readings on this subject would be much appreciated

joran
  • 169,992
  • 32
  • 429
  • 468
user2920249
  • 137
  • 4
  • 13
  • take the code you used to merge the first two. Then use the same approach to merge the third. Then check `pH2 == pH & pH3 == pH`. Then if you are having trouble, post the code, and someone will help you. – C8H10N4O2 Sep 14 '15 at 19:37
  • A quick way to check if they're the same is `identical(DataFrameA, DataFrameB)`. If they are the same, you're done, if they're not then you can look for the differences. – Gregor Thomas Sep 14 '15 at 19:40

2 Answers2

0

This might not be the cleanest way, but it should work:

compare3row <- function(data1, data2, data3) {
  bool1 <- (all.equal(data1, data2) == TRUE) [1]
  bool2 <- (all.equal(data2, data3) == TRUE) [1]
  if(bool1 & bool2){
    return(TRUE)
  }
  else {
    return(FALSE)
  }
}

sapply(1:nrow(DataFrameA), function(n){compare3row(DataFrameA[n, ], DataFrameB[n, ],
                                                   DataFrameC[n, ])})
0

One approach is to compare pairs of data frames. Summarising several options from this post. For example comparing df1 and df2.

Package compare if you need TRUE or FALSE:

library(compare)
compare(df1, df2)

Output:

FALSE [FALSE, FALSE, TRUE]

Packagesqldfto use SQL:

library(sqldf)
# Different rows
sqldf('SELECT * FROM df1 EXCEPT SELECT * FROM df2')
  Date Time   pH
1 10/9  800 4.05
# Common rows
sqldf('SELECT * FROM df1 INTERSECT SELECT * FROM df2')
   Date Time   pH
1 10/10 1300 3.95
2  10/8  600 3.85

Package dplyr:

library(dplyr)
# Different rows
anti_join(df1,df2)
  Date Time   pH
    1 10/9  800 4.05
# Common rows
semi_join(df1, df2)
  Date Time   pH
1 10/10 1300 3.95
2  10/8  600 3.85

Data

df1 <- read.table(text="Date    Time    pH
                  1   10/8    600     3.85
                  2   10/9    800     4.05
                  3   10/10   1300    3.95", 
                  head=TRUE)    
df2 <- read.table(text="Date    Time    pH
                  1    10/8    600     3.85
                  2    10/12   900     4.05
                  3    10/10   1300    3.95", 
                  head=TRUE)    
df3 <- read.table(text="Date    Time    pH
                  1    10/8    600     8.85
                  2    10/9    800     4.05
                  3    10/10   1300    3.95", 
                  head=TRUE)
Community
  • 1
  • 1
mpalanco
  • 12,960
  • 2
  • 59
  • 67