0

this is probably stupid, but i have the following problem: I have two tables:

1)Table with therapies on a specific patient with beginning and ending date:

  therapyID patientID startoftherapy endoftherapy
1          1       233         5.5.10       6.6.11
2          2       233         7.7.11       8.8.11
3          3       344         1.1.09       3.2.10
4          4       344         3.3.10     10.10.11
5          5       544         2.1.09       3.2.10
6          6       544         4.3.12       4.3.14
7          7       113         1.1.12       1.1.15
8          8       123         2.1.13       1.1.15
9          9       543         2.1.09       3.2.10
10        10       533         7.7.11       8.8.14

2)Table with many diagnoses, the specific patient and date and description:

 diagnosisID dateofdiagnosis patientID diagnosis
1          11          8.8.10       233       xxx
2          22         5.10.11       233       yyy
3          33          8.9.11       233       xxx
4          44          2.2.09       344       zzz
5          55          3.3.09       344       yyy
6         666          2.2.12       123       zzz
7         777          3.3.12       123       yyy
8         555          3.2.10       543       xxx
9         203          8.8.12       533       zzz

I want to create a new table, with the diagnoses of the patieents in the time of their therapy, i.e. with the matching criteria: patientID, date between startoftherapy and endoftherapy. Something like this:

therapyID diagnosisID patientID dateofdiagnosis diagnosis
1         1          11       233        08.08.10       xxx
2         2          22       233        05.10.11       yyy
3         2          33       233        08.09.11       xxx

I´m way to unexperienced to do this, can anyone help me with this or point me in the right direction?

Ben
  • 3
  • 4
  • 1
    Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)?](http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – Chris Apr 15 '16 at 17:42

1 Answers1

0

We can do it with `plyr:

# We recreate your data.frames
df1 <- read.table(text="
therapyID patientID startoftherapy endoftherapy
1          1       233         5.5.10       6.6.11
2          2       233         7.7.11       8.8.11
3          3       344         1.1.09       3.2.10
4          4       344         3.3.10     10.10.11
5          5       544         2.1.09       3.2.10
6          6       544         4.3.12       4.3.14
7          7       113         1.1.12       1.1.15
8          8       123         2.1.13       1.1.15
9          9       543         2.1.09       3.2.10
10        10       533         7.7.11       8.8.14", h=T)

df2 <- read.table(text="
diagnosisID dateofdiagnosis patientID diagnosis
1          11          8.8.10       233       xxx
2          22         5.10.11       233       yyy
3          33          8.9.11       233       xxx
4          44          2.2.09       344       zzz
5          55          3.3.09       344       yyy
6         666          2.2.12       123       zzz
7         777          3.3.12       123       yyy
8         555          3.2.10       543       xxx
9         203          8.8.12       533       zzz", h=T)

We load dplyr ; install.packages("dplyr") if you don't have it.

    library(dplyr)

Then we left_join by patientID. A graphical definition (and more) can be found here. Then we just rearrange column order.

# we first left_join 
    left_join(df1, df2, "patientID") %>% 
      select(therapyID,diagnosisID,patientID, dateofdiagnosis, diagnosis) %>% 
      arrange(therapyID)

We obtain:

therapyID diagnosisID patientID dateofdiagnosis diagnosis
1          1          11       233          8.8.10       xxx
2          1          22       233         5.10.11       yyy
3          1          33       233          8.9.11       xxx
4          2          11       233          8.8.10       xxx

The output may be different from the one you provided because of row order. It can be changed with arrange. Is this what you want?


EDIT

I want to sort out cases where date of diagnosis did not happened during the therapy

Then you first need to properly convert time column to date format. This function does the job for your format:

ch2date <- function(x) as.Date(x, format="%d.%m.%y")

We can include it to the pipe and then use these columns for filtering:

left_join(df1, df2, "patientID") %>% 
  mutate(startoftherapy  = ch2date(startoftherapy),
         endoftherapy    = ch2date(endoftherapy),
         dateofdiagnosis = ch2date(dateofdiagnosis)) %>% 
  filter(startoftherapy < dateofdiagnosis, dateofdiagnosis < endoftherapy) %>% 
  select(therapyID, diagnosisID, patientID, dateofdiagnosis, diagnosis) %>% 
  arrange(therapyID)

Does it solve your problem?

Vincent Bonhomme
  • 7,235
  • 2
  • 27
  • 38
  • Hi Vincent, thank you for your answer. This is almost what i needed. I still need to sort out the diagnoses, which didn´t occur inside the time of therapy: – Ben Apr 20 '16 at 17:06
  • thank you very much @Vincent Bonhomme ( I´m late with the thank you because the rules made me hesitate, but i have seen everybody doin it, so: thank u:) ) – Ben Jun 02 '16 at 09:15