I have a large dataset, over 1.5 million rows, from 600k unique subjects, so a number of subjects have multiple rows. I am trying to find the cases where the one of the subjects has a DOB entered incorrectly.
test <- data.frame(
ID=c(rep(1,3),rep(2,4),rep(3,2)),
DOB = c(rep("2000-03-01",3), "2000-05-06", "2002-05-06",
"2000-05-06", "2000-05-06", "2004-04-06", "2004-04-06")
)
> test
ID DOB
1 1 2000-03-01
2 1 2000-03-01
3 1 2000-03-01
4 2 2000-05-06
5 2 2002-05-06
6 2 2000-05-06
7 2 2000-05-06
8 3 2004-04-06
9 3 2004-04-06
What I am after is some code to basically identify that '2' has an error. I can think of some round about ways using a for loop but that would be computationally inefficient.
Thanks