I am trying to bring rows from a list within the tibble I have based on conditions on different dates I have. I am hoping to solve this with tidyverse library.
Here is a type of a data I have:
table_age <- structure(list(id = c(1, 1, 2, 3, 4, 5, 6), age_band = c("5_9",
"5_9", "10_14", "15-19", "20-24", "5_9", "10_14"), start_date = c("2020-01-01 00:08",
"2020-02-01 00:00", "2020-01-08 10:08", "2020-01-02 17:08", "2020-01-08 16:08",
"2020-01-10 08:08", "2020-01-03 09:08"), end_date = c("2020-01-04 10:08",
"2020-02-11 00:00", "2020-01-09 10:08", "2020-01-03 19:08", "2020-01-11 16:08",
"2019-01-30 08:08", "2020-01-05 09:08")), row.names = c(NA, -7L
), class = c("tbl_df", "tbl", "data.frame"))
And it looks like this:
table_age
# A tibble: 7 x 4
id age_band start_date end_date
<dbl> <chr> <chr> <chr>
1 1 5_9 2020-01-01 00:08 2020-01-04 10:08
2 1 5_9 2020-02-01 00:00 2020-02-11 00:00
3 2 10_14 2020-01-08 10:08 2020-01-09 10:08
4 3 15-19 2020-01-02 17:08 2020-01-03 19:08
5 4 20-24 2020-01-08 16:08 2020-01-11 16:08
6 5 5_9 2020-01-10 08:08 2019-01-30 08:08
7 6 10_14 2020-01-03 09:08 2020-01-05 09:08
>
And a second data type I have is:
structure(list(id = c(1, 1, 2, 2, 3, 4, 5, 6), med_name_one = c("Co-amoxiclav",
"doxycycline", "Gentamicin", "Co-trimoxazole", "Gentamicin",
"Co-trimoxazole", "Sodium Chloride", "Piperacillin"), med_name_two = c(NA,
"Gentamicin", "Co-trimoxazole", NA, NA, NA, NA, NA), mg_one = c("411 mg",
"120 mg", "11280 mg", "8 mg", "11280 mg", "8 mg", "411 mg", "120 mg"
), mg_two = c(NA, "11280 mg", "8 mg", NA, NA, NA, NA, NA), administration_datetime = c("2020-01-03 10:08",
"2020-01-01 11:08", "2020-01-02 19:08", "2020-01-08 20:08", "2020-01-02 19:08",
"2020-01-08 20:08", "2019-01-30 08:08", "2020-01-03 09:08")), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
And the look of it:
table
# A tibble: 8 x 6
id med_name_one med_name_two mg_one mg_two administration_datetime
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 Co-amoxiclav NA 411 mg NA 2020-01-03 10:08
2 1 doxycycline Gentamicin 120 mg 11280 mg 2020-01-01 11:08
3 2 Gentamicin Co-trimoxazole 11280 mg 8 mg 2020-01-02 19:08
4 2 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
5 3 Gentamicin NA 11280 mg NA 2020-01-02 19:08
6 4 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
7 5 Sodium Chloride NA 411 mg NA 2019-01-30 08:08
8 6 Piperacillin NA 120 mg NA 2020-01-03 09:08
Now the conditions in which I have to merge these 2 datasets are:
if the
administration_datetime
intable
is within the intervals ofstart_date
andend_date
fromtable_age
then merge all the info from both tiblesif the
administration_datetime
intable
is outside of the intervals ofstart_date
andend_date
fromtable_age
, keep id info fromtable_age
and giveNA
forthe rows in table
. The reason for keeping this types of rows is because they occur in table_age and do not want to loose the information from this specific table. It seems the opposite condition in point 3, down bellow but it is not.yet, if the
administration_datetime
in table is outside of both date time intervals in the table_age, then do not merge those rows at all. Because if I do, then the interval date_times will repeat itself. And I do not want this happening.
This is the type of table I want:
table_answer
# A tibble: 8 x 9
id med_name_one med_name_two mg_one mg_two administration_datetime age_band start_date end_date
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Co-amoxiclav NA 411 mg NA 2020-01-03 10:08 5_9 2020-01-01 00:08 2020-01-04 10:08
2 1 NA NA NA NA NA 5_9 2020-02-01 00:00 2020-02-11 00:00
3 1 doxycycline Gentamicin 120 mg 11280 mg 2020-01-01 11:08 5_9 2020-01-01 00:08 2020-01-04 10:08
4 2 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08 10_14 2020-01-08 10:08 2020-01-09 10:08
5 3 Gentamicin NA 11280 mg NA 2020-01-02 19:08 15-19 2020-01-02 17:08 2020-01-03 19:08
6 4 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08 20-24 2020-01-08 16:08 2020-01-11 16:08
7 5 NA NA NA NA NA 5_9 2020-01-10 08:08 2019-01-30 08:08
8 6 Piperacillin NA 120 mg NA 2020-01-03 09:08 10_14 2020-01-03 09:08 2020-01-05 09:08
- I have tried to use the inner_join from tidyverse and did not succeed
table_joined <- inner_join(table_age, table)
Check this out:
table_joined
# A tibble: 10 x 9
id age_band start_date end_date med_name_one med_name_two mg_one mg_two administration_dateti…
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 5_9 2020-01-01 00:08 2020-01-04 10:… Co-amoxiclav NA 411 mg NA 2020-01-03 10:08
2 1 5_9 2020-01-01 00:08 2020-01-04 10:… doxycycline Gentamicin 120 mg 11280 … 2020-01-01 11:08
3 1 5_9 2020-02-01 00:00 2020-02-11 00:… Co-amoxiclav NA 411 mg NA 2020-01-03 10:08
4 1 5_9 2020-02-01 00:00 2020-02-11 00:… doxycycline Gentamicin 120 mg 11280 … 2020-01-01 11:08
5 2 10_14 2020-01-08 10:08 2020-01-09 10:… Gentamicin Co-trimoxazole 11280 … 8 mg 2020-01-02 19:08
6 2 10_14 2020-01-08 10:08 2020-01-09 10:… Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
7 3 15-19 2020-01-02 17:08 2020-01-03 19:… Gentamicin NA 11280 … NA 2020-01-02 19:08
8 4 20-24 2020-01-08 16:08 2020-01-11 16:… Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
9 5 5_9 2020-01-10 08:08 2019-01-30 08:… Sodium Chloride NA 411 mg NA 2019-01-30 08:08
10 6 10_14 2020-01-03 09:08 2020-01-05 09:… Piperacillin NA 120 mg NA 2020-01-03 09:08
I have tried to nest the
table
intotable_age
and then try to get the rows wanted with a function I have created and then apply themap
function frompurrr library
:table_nested <- nest_join(table_age, table) get_medication_name <- function(medication_name_df) { medication_name <- medication_name_df %>% dplyr::group_by(id) %>% dplyr::arrange(administered_datetime) %>% pull(med_name_one) } table_answer <- mutate(medication_name = purrr::map(table_nested, get_medication_name))
And yet as I am aware I do not put any conditions there, I have at least hoped to get a bit closer with this function. On top of this I get all sorts of errors.
Is there a way to achieve this, yet I will favor an outcome along the lines of my 2nd solution. Of course I can go with along other lines but with second I can build on something more extensive I have already.