0

I would like to merge two databases with inner and left joint based on n_pz and dates.

In db_1 I have date of admission and discharge.

In db_2 I have date of exams.

I need a database with only the patients that had exams during the recovery (inner joint) and an other database with all the patients that had recovery complete with exams done during the recovery (left joint).

Here's the problem. I have date of admission and disharge in db_1 and I have to keep only the record from db_2 with dates between the date of admission and that of discharge.

db_1:

  n_pz admission_date discharge_date
1    1     02/01/2000     07/01/2000
2    2     15/01/2000     16/01/2000
3    3     20/02/2000     27/02/2000
4    4     27/02/2000     29/02/2000
5    5     10/03/2000     13/03/2000
6    6     20/04/2000     25/04/2000

db_2


    n_pz buffer_date buffer_result
1     1  01/01/2000             0
2     1  02/01/2000             1
3     1  03/01/2000             0
4     1  05/01/2000             0
5     2  15/01/2000             1
6     2  17/01/2000             1
7     3  25/02/2000             0
8     4  26/02/2000             1
9     5  15/03/2000             0
10    6  17/03/2000             1
11    6  24/04/2000             0

I don't know how to use merge in this case cause I don't have the same variable to merge..

I would like to obtain:

out_db (left joint)

n_pz    admission_date  discharge_date  buffer_date buffer_result
  1 02/01/2000  07/01/2000  02/01/2000  1
  1 02/01/2000  07/01/2000  03/01/2000  0
  1 02/01/2000  07/01/2000  05/01/2000  0
  2 15/01/2000  16/01/2000  15/01/2000  1
  3 20/02/2000  27/02/2000  25/02/2000  0
  4 27/02/2000  29/02/2000      
  5 10/03/2000  13/03/2000      
  6 20/04/2000  25/04/2000  24/04/2000  0

out_db (inner joint)

n_pz    admission_date  discharge_date  buffer_date buffer_result
  1 02/01/2000  07/01/2000  02/01/2000  1
  1 02/01/2000  07/01/2000  03/01/2000  0
  1 02/01/2000  07/01/2000  05/01/2000  0
  2 15/01/2000  16/01/2000  15/01/2000  1
  3 20/02/2000  27/02/2000  25/02/2000  0
  6 20/04/2000  25/04/2000  24/04/2000  0

I've tried in ths way but it didn't work:

fmt <- "%d/%m/%Y"
db_11 <- transform(db_1, admission_date = as.Date(admission_date, fmt),
                   discharge_date = as.Date(discharge_date, fmt))
                   
                   
db_21 <- transform(db_2, buffer_date = as.Date(buffer_date, fmt))
                  
s <- subset(merge(db_11, db_21,by = "n_pz", all.x = TRUE), admission_date <= buffer_date & discharge_date >= buffer_date)
merge(db_11, s, by = "n_pz",all.x = TRUE) 

I obtain a db with 0 observations.

I've tried also in this way:

fmt <- "%d/%m/%Y"
db_11 <- transform(db_1, admission_date = as.Date(admission_date, fmt),
                   discharge_date = as.Date(discharge_date, fmt))                 
db_21 <- transform(db_2, buffer_date = as.Date(buffer_date, fmt))
df3 <- dplyr::left_join(db_11, db_21, by=c("n_pz"," admission_date " <= " buffer_date " & " discharge_date " >= " buffer_date "))

But I obtain this error message:

Errore: Join columns must be present in data.
x Problem with `FALSE`.

Then I would like to transpose multiple data for dates and results of exams.. but maybe I should create a different post for this..

out db (left joint) with transpose

 n_pz   admission_date  discharge_date  buffer_date buffer_result   buffer_date_1   buffer_result_1 buffer_date_1   buffer_result_1
  1 02/01/2000  07/01/2000  02/01/2000  1   03/01/2000  0   05/01/2000  0
  2 15/01/2000  16/01/2000  15/01/2000  1               
  3 20/02/2000  27/02/2000  25/02/2000  0               
  4 27/02/2000  29/02/2000                      
  5 10/03/2000  13/03/2000                      
  6 20/04/2000  25/04/2000  24/04/2000  0   

out db (inner joint) with transpose

 n_pz   admission_date  discharge_date  buffer_date buffer_result   buffer_date_1   buffer_result_1 buffer_date_1   buffer_result_1
  1 02/01/2000  07/01/2000  02/01/2000  1   03/01/2000  0   05/01/2000  0
  2 15/01/2000  16/01/2000  15/01/2000  1               
  3 20/02/2000  27/02/2000  25/02/2000  0               
  6 20/04/2000  25/04/2000  24/04/2000  0
ArTu
  • 431
  • 4
  • 20
  • Like this https://stackoverflow.com/q/65653029/5325862? Or https://stackoverflow.com/q/23934361/5325862? – camille Nov 28 '21 at 02:23
  • 2
    Range-based joins are supported with the following packages: `data.table`, `sqldf`, `fuzzyjoin`; it is not directly supported in base or `dplyr`. But *first*, make sure you're working with actual dates, not strings. `02/01/2000` is the latter, use `as.Date(.., format="%d/%m/%Y")` to convert them all to proper `Date`-class. – r2evans Nov 28 '21 at 02:57
  • Code you provide code to make the reproducible example please - data is too long for me to enter manually – rg255 Nov 30 '21 at 18:16
  • @camille these links are very interesting, but if I have to merge not only by date but also by id? – ArTu Dec 03 '21 at 12:41
  • @rg255 how can I make it reproducible? Thank you – ArTu Dec 03 '21 at 12:42
  • [Here's the guidance](https://stackoverflow.com/q/5963269/5325862) on reproducible examples. Some of that is also included in the FAQ text at the top of the [tag:r] tag – camille Dec 03 '21 at 16:06

0 Answers0