Merging / Joining Data While Keeping All Data from Both Data Frames (even if no matches)

Question

I have two data frames that I am trying to combine into one master data frame by ID and Date. My issue is that the data frames have some similar and some unique dates. One data frame goes sequentially through the dates but may be missing a day or two at the beginning and/or end, while the other data frame has multiple samples from the same ID and is only sampled every 3 days.

DF1 example:

Nest.ID     Date        X       Y       Nest.ID.Date
AMRO_1_     5/2/20      7       2       AMRO_1_5-02-20
AMRO_1_     5/3/20      1       5       AMRO_1_5-03-20
AMRO_1_     5/4/20      7       9       AMRO_1_5-04-20
AMRO_1_     5/5/20      3       2       AMRO_1_5-05-20
AMRO_1_     5/6/20      1       3       AMRO_1_5-06-20

DF2 Example

Nest.ID     Indiv.ID    Date        U       V       Nest.ID.Date
AMRO_1_     A           5/1/20      468     294     AMRO_1_5-01-20
AMRO_1_     B           5/1/20      454     456     AMRO_1_5-01-20
AMRO_1_     C           5/1/20      436     245     AMRO_1_5-01-20
AMRO_1_     A           5/4/20      356     762     AMRO_1_5-04-20
AMRO_1_     B           5/4/20      345     953     AMRO_1_5-04-20
AMRO_1_     C           5/4/20      356     345     AMRO_1_5-04-20
AMRO_1_     A           5/7/20      763     193     AMRO_1_5-07-20
AMRO_1_     B           5/7/20      763     186     AMRO_1_5-07-20
AMRO_1_     C           5/7/20      235     762     AMRO_1_5-07-20

Wanted Outcome:

Nest.ID     Date        X       Y       Indiv.ID    U       V
AMRO_1_     5/1/20      NA      NA      A           468     294
AMRO_1_     5/1/20      NA      NA      B           454     456
AMRO_1_     5/1/20      NA      NA      C           436     245
AMRO_1_     5/2/20      7       2       NA          NA      NA
AMRO_1_     5/3/20      1       5       NA          NA      NA
AMRO_1_     5/4/20      7       9       A           356     762
AMRO_1_     5/4/20      7       9       B           345     953
AMRO_1_     5/4/20      7       9       C           356     345
AMRO_1_     5/5/20      3       2       NA          NA      NA
AMRO_1_     5/6/20      1       3       NA          NA      NA
AMRO_1_     5/7/20      NA      NA      A           763     193
AMRO_1_     5/7/20      NA      NA      B           763     186
AMRO_1_     5/7/20      NA      NA      C           235     762

Using a full_join with keep= TRUE I can get very close and get a final DF with all the data from DF1 and with NAs where DF2 wasn't sampled, but I cannot figure out how to also include the beginning/end dates that are not in DF1 but in DF2 (so 5/1 and 5/7 in the examples). I suspect this may be due to my "key" of ID.Date, but those are the only variables I can use to keep things in order when merging (in other words, since I have multiple samples per ID, I can't just use ID as my "key").

It would be helpful to see the actual snippet of code you're trying to debug. If you're using `dplyr::full_join`, that doesn't have an `all` argument, but it does have a `keep` argument — camille, Sep 29 '20 at 23:34
Please in code questions give a [mre]--cut & paste & runnable code, including smallest representative example input as code; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. Give the least code you can that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For R/SQL that includes DBMS & DDL (including constraints & indexes) & input as code formatted as a table. [ask] Pause work on the overall goal, chop code to the 1st expression not giving what you expect & say what you expect & why. — philipxy, Sep 29 '20 at 23:42
@camille Ah yes, that is what I meant. I've been switching between merge() and join() and switched the two in my head. Regardless, ````keep = ```` doesn't do the job either and I am still lacking those beginning and ending dates from DF2 in my final merged DF — EagleEye, Sep 30 '20 at 00:23
Joins are not on keys, they are on conditions. Keys & other constraints need not be known to query; input & output table meanings are necessary & sufficient to query. **When giving a business relation(ship)/association or table (base or query result), say what a row in it states about the business situation in terms of its column values.** [Is there any rule of thumb to construct SQL query from a human-readable description?](https://stackoverflow.com/a/33952141/3404097) [Re relational querying.](https://stackoverflow.com/a/55671890/3404097) You don't now clearly say what rows get returned. — philipxy, Sep 30 '20 at 01:01
You seem to just want a projection of a full join on equality of Nest.ID.Date values & equality of Nest.ID values, ie on equality of Date values & equality of Nest.ID values. You say you tried that & didn't get what you wanted in a comment on a deleted answer, but what you say you get is not the result of the full join you say you did, and your example output is. [mre] — philipxy, Sep 30 '20 at 01:28

score 1 · Accepted Answer · answered Sep 30 '20 at 01:42

You want to do a full_join() from dplyr.

Here are the data in friendlier form:

DF1 <- structure(list(Nest.ID = c("AMRO_1_", "AMRO_1_", "AMRO_1_", "AMRO_1_", 
"AMRO_1_"), Date = c("5/2/20", "5/3/20", "5/4/20", "5/5/20", 
"5/6/20"), X = c(7, 1, 7, 3, 1), Y = c(2, 5, 9, 2, 3), Nest.ID.Date = c("AMRO_1_5-02-20", 
"AMRO_1_5-03-20", "AMRO_1_5-04-20", "AMRO_1_5-05-20", "AMRO_1_5-06-20"
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L), spec = structure(list(cols = list(Nest.ID = structure(list(), class = c("collector_character", 
"collector")), Date = structure(list(), class = c("collector_character", 
"collector")), X = structure(list(), class = c("collector_double", 
"collector")), Y = structure(list(), class = c("collector_double", 
"collector")), Nest.ID.Date = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

DF2 <- structure(list(Nest.ID = c("AMRO_1_", "AMRO_1_", "AMRO_1_", "AMRO_1_", 
"AMRO_1_", "AMRO_1_", "AMRO_1_", "AMRO_1_", "AMRO_1_"), Indiv.ID = c("A", 
"B", "C", "A", "B", "C", "A", "B", "C"), Date = c("5/1/20", "5/1/20", 
"5/1/20", "5/4/20", "5/4/20", "5/4/20", "5/7/20", "5/7/20", "5/7/20"
), U = c(468, 454, 436, 356, 345, 356, 763, 763, 235), V = c(294, 
456, 245, 762, 953, 345, 193, 186, 762), Nest.ID.Date = c("AMRO_1_5-01-20", 
"AMRO_1_5-01-20", "AMRO_1_5-01-20", "AMRO_1_5-04-20", "AMRO_1_5-04-20", 
"AMRO_1_5-04-20", "AMRO_1_5-07-20", "AMRO_1_5-07-20", "AMRO_1_5-07-20"
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-9L), spec = structure(list(cols = list(Nest.ID = structure(list(), class = c("collector_character", 
"collector")), Indiv.ID = structure(list(), class = c("collector_character", 
"collector")), Date = structure(list(), class = c("collector_character", 
"collector")), U = structure(list(), class = c("collector_double", 
"collector")), V = structure(list(), class = c("collector_double", 
"collector")), Nest.ID.Date = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

Here is the code.

library(dplyr)
library(lubridate)

DF1 %>% full_join(DF2) %>%
  select(-Nest.ID.Date) %>%
  mutate(Date = mdy(Date)) %>%
  arrange(Date)  

Joining, by = c("Nest.ID", "Date", "Nest.ID.Date")
# A tibble: 13 x 7
   Nest.ID Date           X     Y Indiv.ID     U     V
   <chr>   <date>     <dbl> <dbl> <chr>    <dbl> <dbl>
 1 AMRO_1_ 2020-05-01    NA    NA A          468   294
 2 AMRO_1_ 2020-05-01    NA    NA B          454   456
 3 AMRO_1_ 2020-05-01    NA    NA C          436   245
 4 AMRO_1_ 2020-05-02     7     2 NA          NA    NA
 5 AMRO_1_ 2020-05-03     1     5 NA          NA    NA
 6 AMRO_1_ 2020-05-04     7     9 A          356   762
 7 AMRO_1_ 2020-05-04     7     9 B          345   953
 8 AMRO_1_ 2020-05-04     7     9 C          356   345
 9 AMRO_1_ 2020-05-05     3     2 NA          NA    NA
10 AMRO_1_ 2020-05-06     1     3 NA          NA    NA
11 AMRO_1_ 2020-05-07    NA    NA A          763   193
12 AMRO_1_ 2020-05-07    NA    NA B          763   186
13 AMRO_1_ 2020-05-07    NA    NA C          235   762

Merging / Joining Data While Keeping All Data from Both Data Frames (even if no matches)

1 Answers1