I am working with cattle fertility data. In one table (data frame), what I have is a record of all the services performed in a cow (like inseminations). In a different table, I get the pregnancy diagnosis (positive or negative). Both have an unique ID (animal_id). My challenge has been successfully merging both tables in the right data range, meaning what I need is the pregnancy check associated with the right insemination record. Here is a sample of how both tables look like,
animal_id service_date
610710 2005-10-22
610710 2006-12-03
610710 2006-12-27
610710 2007-12-02
610710 2008-01-17
610710 2008-03-04
The other table is the same but with a different date (event_date) and the diagnosis,
animal_id event_date event_description
610710 2006-06-16 PP
610710 2007-02-15 PP
610710 2008-01-09 PN
610710 2008-04-09 PP
610710 2009-06-16 PP
So what I would like to do is merge both tables in a way the dates complement each other, meaning if a service was performed on 2005-10-12, when I join both tables this row will link to the closest date in the Events table, and by closest I also mean later - since insemination happens before diagnosis. So the desired output would be something like this,
animal_id service_date event_date event_description
1 610710 2005-10-22 NA NA
2 610710 NA 2006-06-16 PP
3 610710 2006-12-03 2007-02-15 PP
4 610710 2006-12-27 2007-02-15 PP
5 610710 2007-12-02 2008-01-09 PN
6 610710 2008-01-17 2008-04-09 PP
7 610710 2008-03-04 NA NA
8 610710 NA 2009-06-16 PP
In the final output, I would expect a large number of records not to merge against anything, like row 1 in the example output. There was a service performed in October 2005, but the first Diagnosis I have for that cow is in June 2006 - there are probably a number of service records missing. That is unfortunately to be expected. For this example, only rows 5 and 6 make sense. For rows 3 and 4, I would consider only row 4, since that is probably the insemination that resulted into pregnancy.
Is that even possible in R?
Thank you!