0

I am trying to merge two longitudinal data which are both in the long format.

df1:
patientid visit mental-health
703-FD    1     depressed
703-FD    2     depressed
703-FD    3     depressed
707-NM    1     non-depressed
707-NM    2     non-depressed
707-NM    3     depressed 

df2:
patientid visit HIV_disclosure 
703-FD    1     yes
703-FD    2     yes
703-FD    3     yes
707-NM    1     no
707-NM    2     no
707-NM    3     yes

Code I've tried:

data_combined <- full_join(x=df1, y=df2, by="patientid"): 

patientid visit.x mental-health  visit.y   HIV disclosure
703-FD    1       depressed      1         yes
703-FD    1       depressed      2         yes
703-FD    1       depressed      3         yes
703-FD    2       depressed      1         yes
703-FD    2       depressed      2         yes
703-FD    2       depressed      3         yes
703-FD    3       depressed      1         yes
703-FD    3       depressed      2         yes
703-FD    3       depressed      3         yes
707-NM    1     non-depressed    1         no
707-NM    1     non-depressed    2         no
707-NM    1     non-depressed    3         yes
707-NM    2     non-depressed    1         no
707-NM    2     non-depressed    2         no
707-NM    2     non-depressed    3         yes
707-NM    3     depressed        1         no
707-NM    3     depressed        2         no
707-NM    3     depressed        3         yes

How do I edit the above code to merge by both the patientid and the visit variable?

I've tried:

library (dplyr)
data_combined <- full_join(x=df1, y=df2, by="patientid", "visit")

Desired joined/merged dataframe:

patientid visit  mental-health  HIV disclosure
703-FD    1       depressed         yes
703-FD    2       depressed         yes
703-FD    3       depressed         yes
707-NM    1     non-depressed       no
707-NM    2     non-depressed       no
707-NM    3       depressed         yes

I'm sure it's a simple code, but I've been struggling with it for a while; please assist.

Thandi
  • 225
  • 1
  • 2
  • 9

1 Answers1

1

By default, the dplyr join functions will join by all variables in common. In your data, those two variables are patientid and visit. So, for the sample data you provide, the following simplified code should work:

library(dplyr)
data_combined <- full_join(x=df1, y=df2)

If you want to specify the two columns (perhaps there are more columns in common), then you need to provide a vector to the by = argument.

data_combined <- full_join(x=df1, y=df2, by = c("patientid", "visit"))

Your original code only supplied by = 'patientid'. Since 'visit' was after the comma, full_join() would try to apply 'visit' to another possible argument for full_join().

Ben Norris
  • 5,639
  • 2
  • 6
  • 15