Upsample large dataset

Question

Essentially I'm looking to upsample to fill in missing hours between forecast times.

I have a dataset that looks like this:

  case                              Regions        forecastTime WindSpeed_low
1    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 09:00:00            35
2    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 12:00:00            25
3    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-03 03:00:00            25
4   27 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-05 09:00:00            15
5   27 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-05 16:00:00            00
  WindSpeed_high  poly_id
1             45 fea1-289
2             NA fea1-289
3             NA fea1-289
4             20 fea1-289
5             NA fea1-289

Each issued forecast has a case number, an associated region and forecast time.

My goal is to expand the forecast times for each case to include all hours between the times the forecast changed:

  case                              Regions        forecastTime WindSpeed_low
1    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 09:00:00            35
2    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 10:00:00            35
3    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 11:00:00            35
4    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 12:00:00            25
5    1 EAST COAST-CAPE ST FRANCIS AND SOUTH 2010-01-01 13:00:00            25
  WindSpeed_high  poly_id
1             45 fea1-289
2             45 fea1-289
3             45 fea1-289
4             NA fea1-289
5             NA fea1-289

Here the forecast is the same between 2010-01-01 09:00:00 and 2010-01-01 11:59:59, fd$WindSpeed_low == 35 and fd$WindSpeed_high == 45, however at 2010-01-01 12:00:00 the forecast changes to fd$WindSpeed_low == 25 and fd$WindSpeed_high == NA. I was thinking I could group each forecast by case, but I am stuck on how I should go about this expansion correctly. I am relatively new to R.

Possible duplicate of https://stackoverflow.com/questions/51019672/complete-dataframe-with-missing-combinations-of-values — akrun, Jul 28 '21 at 18:35

score 1 · Accepted Answer · answered Jul 27 '21 at 12:45

1

You may use complete and fill from tidyr -

library(dplyr)
library(tidyr)

df %>%
  group_by(case, Regions) %>%
  complete(forecastTime = seq(min(forecastTime),max(forecastTime),by='hour')) %>%
  fill(WindSpeed_low, poly_id) %>%
  ungroup

answered Jul 27 '21 at 12:45

Ronak Shah

377,200
20
156
213

df <- fd %>% group_by(case, Regions) %>% complete(forecastTime = seq(min(forecastTime),max(forecastTime),by='hour')) %>% fill(WindSpeed_low, WindSpeed_high, poly_id) %>% ungroup works but df$WindSpeed_high fills even when it should be NA – Jordan Ford Jul 27 '21 at 13:35
Can you provide data in a reproducible format so that I can verify what might be wrong? – Ronak Shah Jul 27 '21 at 14:20

Upsample large dataset

1 Answers1