I have two tables: One with participants and one with an encoding of scores based on birth dates. The score table looks like this:
score_table
Key | Value
--------------------
01/01/1900 | 15
01/01/1940 | 25
01/01/1950 | 30
All participants with birth dates between 01/01/1900 and 01/01/1940 should get a score of 15. Participants born between 01/01/1940 and 01/01/1950 should get a score of 25, etc.
My participants' table looks like this:
participant_table
BirthDate | Gender
-----------------------
05/05/1930 | M
02/07/1954 | V
01/11/1941 | U
I would like to add a score to get the output table:
BirthDate | Gender | Score
------------------------------------
05/05/1930 | M | 15
02/07/1954 | V | 30
01/11/1941 | U | 25
I built several solutions for similar problems when the exact values are in the score table (using dplyr::left_join or base::match) or for numbers which can be rounded to another value. Here, the intervals are irregular and the dates arbitrary.
I know I can build a solution by iterating through the score table, using this method:
as.Date("05/05/1930", format="%d/%m/%Y) < as.Date("01/01/1900", format="%d/%m/%Y)
Which returns a Boolean and thus allows me to walk through the scores until I find a date which is bigger and then use the last score. However, there must be a better way to do this.
Maybe I can create some sort of bins from the dataframe, as such:
Bin 1 | Bin 2 | Bin 3
Date 1 : Date 2 | Date 2 : Date 3 | Date 3 : inf
But I don't yet see how. Does anyone see an efficient way to create such bins from a dataframe, so that I can efficiently retrieve scores from this table?
MRE:
Score table:
structure(list(key=c("1/1/1900", "2/1/2013", "2/1/2014","2/1/2015", "4/1/2016", "4/1/2017"), value=c(65,65,67,67,67,68)), row.names=1:6, class="data.frame")
Participant File:
structure(list(birthDate=c("10/10/1968", "6/5/2015","10/10/2017"), Gender=c("M", "U", "F")), row.names=1:3, class="data.frame")
Goal File:
structure(list(birthDate=c("10/10/1968", "6/5/2015","10/10/2017"), Gender=c("M", "U", "F"), Score = c(65,67,68)), row.names=1:3, class="data.frame")