Let's say these are my two datasets:
# This is dataset "df"
df <- tibble(country = rep(c("US", "UK", "FRA", "SPA"), times = 3),
date = rep(c("2020-01-01", "2020-02-01", "2020-03-01"), each = 4))
country date
1 FRA 2020-01-01
2 SPA 2020-01-01
3 UK 2020-01-01
4 US 2020-01-01
5 FRA 2020-02-01
6 SPA 2020-02-01
7 UK 2020-02-01
8 US 2020-02-01
9 FRA 2020-03-01
10 SPA 2020-03-01
11 UK 2020-03-01
12 US 2020-03-01
# This is dataset "dd"
dd <- tibble(country = rep(c("US", "UK", "FRA"), times = 3),
date = rep(c("2020-01-01", "2020-02-01", "2020-03-01"), each = 3),
cases = seq(1:9),
deaths = seq(from = 2, to = 18, by = 2))
country date cases deaths
1 FRA 2020-01-01 1 2
2 UK 2020-01-01 2 4
3 US 2020-01-01 3 6
4 FRA 2020-02-01 4 8
5 UK 2020-02-01 5 10
6 US 2020-02-01 6 12
7 FRA 2020-03-01 7 14
8 UK 2020-03-01 8 16
9 US 2020-03-01 9 18
I would like to create a new dataset called dj
which combines df
and the cases column in dd
by country and date. I also want NA to be printed in the cases column (in dj
) where there is missing data (i.e. for country "SPA").
In essence, I am trying to get dj
to have 12 rows and 3 columns (country, date, cases), with NA for cases in "SPA". Like this:
country date cases
1 FRA 2020-01-01 1
2 SPA 2020-01-01 NA
3 UK 2020-01-01 2
4 US 2020-01-01 3
5 FRA 2020-02-01 4
6 SPA 2020-02-01 NA
7 UK 2020-02-01 5
8 US 2020-02-01 6
9 FRA 2020-03-01 7
10 SPA 2020-03-01 NA
11 UK 2020-03-01 8
12 US 2020-03-01 9
I know you can combine two datasets using dplyr package *_join() functions, but how can I use these when I only want to merge a dataset with one particular column from another dataset and use more than one 'by' argument (i.e. country and date)??
I'm having difficulty in figuring this out myself since I'm new to R, so any help would be very much appreciated :)