0

I have a dataset with specific dates and IDs

 date      station_id 
<date>     <chr>      
1 1978-01-01 US1CTFR0035
2 1978-01-01 US1NYSF0080
3 1978-01-01 USC00302964
4 1978-01-02 US1NYNS0037
5 1978-01-02 US1NYRC0002
6 1978-01-03 US1SCHR0014 

I have another dataset with data for every available date for almost all IDs

station_id  date        prcp
 <chr>       <chr>      <dbl>
1 US1CTFR0035 03/04/2016   140
2 US1CTFR0035 04/04/2016     0
3 US1CTFR0035 05/04/2016   137
4 US1CTFR0035 06/04/2016     0
5 US1CTFR0035 07/04/2016     0
6 US1CTFR0035 08/04/2016    28

I want to look up the corresponding values based on station_id and date from the second dataset and append a column in data 1. Also, some of the corresponding values will be missing in data 2 and I would like R to return NA if it is the case.

Anything I tried, R is giving me; datasets are not of the same length or similar error messages. Can anyone help me? Thanks

2 Answers2

1

Change the date column in data2 from character to Date class and then you can join the two datasets.

This can be done using dplyr and lubridate

library(dplyr)
library(lubridate)

data1 %>% left_join(data2 %>% 
                    mutate(date = dmy(date)), by = c('station_id', 'date'))

Or in base R :

data2$date <- as.Date(data2$date, "%d/%m/%Y")
merge(data1, data2, by = c('station_id', 'date'), all.x = TRUE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

An option with data.table would be

library(data.table)
setDT(data1)[data2[, date := as.IDate(date, "%d/%m/%Y")], on = .(station_id, date)]
akrun
  • 874,273
  • 37
  • 540
  • 662