Im new to R, I have a data frame of 500000 entries of patient IDs and dates and other variables..
I want to remove any repeated duplicated patient ID(PtID) if they happen to come within one year of their first appearance.. for example:
PtID date**
1. 1 01/01/2006
2. 2 01/01/2006
3. 1 24/02/2006
4. 4 26/03/2006
5. 1 04/05/2006
6. 1 05/05/2007
in this case I want to remove the 3rd and the 5th rows and keep the 1st and 6th rows..
can somebody help me with this please.. this is the str(my data which is called final1)
str(final1)
'data.frame': 605870 obs. of 70 variables:
...
$ Date : Date, format: "2006-03-12" "2006-04-01" ...
$ PtID : int 11251 11251 11251 11251 11251 11251 11251 30938 30938 11245 ...
...