0

I have two data frames (a1 and a2).

The first (a1) is an original dataset and the second (a2) is the same, only contains data that has been appended to some records. I want to get a count of the number of records that contain appended data. I don't need to view the records.

What is the best way to just get a count of the number of records that are different in a2?

  • 4
    Its not clear what you're asking, A simple example would help. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – IceCreamToucan Jun 29 '18 at 16:08
  • Please, provide a [mcve]. From your description it is not clear what the difference between dataframes `a1` and `a2` is, especially what you mean by *contains data that has been appended to some records*. Thank you. – Uwe Jun 29 '18 at 16:35

2 Answers2

1

Ok, so first let me get this straight. You basically want to compare two dataframes and find the number of different columns.

Using dplyr

> a1
  a b
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e

> a2
  a b
1 1 a
2 2 b
3 3 c

>  df <- setdiff(a1,a2)
  a b
1 4 d
2 5 e

> nrow(df)
2

Is this what you are looking for?

ReKx
  • 996
  • 2
  • 10
  • 23
1

Using anti_join from dplyr: anti_join a2 with a1 will result in the records that are in a2, but not in a1. and tally will count the rows.

a2 %>% 
  anti_join(a1) %>% 
  tally() 
phiver
  • 23,048
  • 14
  • 44
  • 56