1

I have a dataframe that contains hourly weather information. I would like to increase the granularity of the time measurements (5 minute intervals instead of 60 minute intervals) while copying the other columns data into the new rows created:

Current Dataframe Structure:

Date                Temperature Humidity
2015-01-01 00:00:00 25          0.67
2015-01-01 01:00:00 26          0.69

Target Dataframe Structure:

Date                Temperature Humidity 
2015-01-01 00:00:00 25          0.67
2015-01-01 00:05:00 25          0.67
2015-01-01 00:10:00 25          0.67
.
.
.
2015-01-01 00:55:00 25          0.67
2015-01-01 01:00:00 26          0.69
2015-01-01 01:05:00 26          0.69
2015-01-01 01:10:00 26          0.69
.
.
.

What I've Tried:

for(i in 1:nrow(df)) {


  five.minutes <- seq(df$date[i], length = 12, by = "5 mins")

  for(j in 1:length(five.minutes)) {

    df$date[i]<-rbind(five.minutes[j])

  }
}

Error I'm getting:

Error in as.POSIXct.numeric(value) : 'origin' must be supplied

lsch91
  • 335
  • 3
  • 11
  • I apologize, I made an error when initially typing it in so i was just fixing the typo. No the target dataframe should not change. For example at 2015-01-01 01:00:00-01:55:00 should have temperature = 26 and humidity = 0.69 while 2015-01-01 00:00:00-00:55:00 should have temperature = 25 and humidity = 0.67 – lsch91 Feb 17 '18 at 23:07
  • they aren't. here let me clarify target dataframe above in the question – lsch91 Feb 17 '18 at 23:09

2 Answers2

1

The one possible solution can be using fill from tidyr and right_join from dplyr.

The approach is to create date/time series between min and max+55mins times from dataframe. Left join dataframe with timeseries which will provide you all desired rows but NA for Temperature and Humidity. Now use fill to populated NA values with previous valid values.

# Data
df <- read.table(text = "Date                Temperature Humidity 
'2015-01-01 00:00:00' 25          0.67
'2015-01-01 01:00:00' 26          0.69
'2015-01-01 02:00:00' 28          0.69
'2015-01-01 03:00:00' 25          0.69", header = T, stringsAsFactors = F)

df$Date <- as.POSIXct(df$Date, format = "%Y-%m-%d %H:%M:%S")

# Create a dataframe with all possible date/time at intervale of 5 mins
Dates <- data.frame(Date = seq(min(df$Date), max(df$Date)+3540, by = 5*60))


result <- df %>%
  right_join(Dates, by="Date") %>%
  fill(Temperature, Humidity)

 result
#                  Date Temperature Humidity
#1  2015-01-01 00:00:00          25     0.67
#2  2015-01-01 00:05:00          25     0.67
#3  2015-01-01 00:10:00          25     0.67
#4  2015-01-01 00:15:00          25     0.67
#5  2015-01-01 00:20:00          25     0.67
#6  2015-01-01 00:25:00          25     0.67
#7  2015-01-01 00:30:00          25     0.67
#8  2015-01-01 00:35:00          25     0.67
#9  2015-01-01 00:40:00          25     0.67
#10 2015-01-01 00:45:00          25     0.67
#11 2015-01-01 00:50:00          25     0.67
#12 2015-01-01 00:55:00          25     0.67
#13 2015-01-01 01:00:00          26     0.69
#14 2015-01-01 01:05:00          26     0.69
#.....
#.....
#44 2015-01-01 03:35:00          25     0.69
#45 2015-01-01 03:40:00          25     0.69
#46 2015-01-01 03:45:00          25     0.69
#47 2015-01-01 03:50:00          25     0.69
#48 2015-01-01 03:55:00          25     0.69
MKR
  • 19,739
  • 4
  • 23
  • 33
  • Thanks for the help. When i try the code above i now get error message :Error in UseMethod("right_join") : no applicable method for 'right_join' applied to an object of class "function" – lsch91 Feb 18 '18 at 00:10
  • Yeah i have that one loaded and I have the %>% in both locations indicated above – lsch91 Feb 18 '18 at 00:17
  • it prints the first 58 rows (which all look modified properly) but i'm not familiar with the fill() function. Are the results supposed to be added to the "Dates" dataframe or the "df" dataframe? – lsch91 Feb 18 '18 at 00:33
  • @lsch91 You need to assign the result to a new dataframe or even you can assign to `df`. Let me modify the ans to make it clear for you. – MKR Feb 18 '18 at 00:36
  • gotcha. I was thinking that would be the case. Thanks for your help!!!!!!!! – lsch91 Feb 18 '18 at 00:37
  • @lsch91 Happy to help. You can delete most of conversations as it wont help future users. – MKR Feb 18 '18 at 00:38
0

I think this might do:

df=tibble(DateTime=c("2015-01-01 00:00:00","2015-01-01 01:00:00"),Temperature=c(25,26),Humidity=c(.67,.69))
df$DateTime<-ymd_hms(df$DateTime)
DateTime=as.POSIXct((sapply(1:(nrow(df)-1),function(x) seq(from=df$DateTime[x],to=df$DateTime[x+1],by="5 min"))),
           origin="1970-01-01", tz="UTC")
Temperature=c(sapply(1:(nrow(df)-1),function(x) rep(df$Temperature[x],12)),df$Temperature[nrow(df)])
Humidity=c(sapply(1:(nrow(df)-1),function(x) rep(df$Humidity[x],12)),df$Humidity[nrow(df)])
tibble(as.character(DateTime),Temperature,Humidity)

<chr>                          <dbl>    <dbl>
 1 2015-01-01 00:00:00             25.0    0.670
 2 2015-01-01 00:05:00             25.0    0.670
 3 2015-01-01 00:10:00             25.0    0.670
 4 2015-01-01 00:15:00             25.0    0.670
 5 2015-01-01 00:20:00             25.0    0.670
 6 2015-01-01 00:25:00             25.0    0.670
 7 2015-01-01 00:30:00             25.0    0.670
 8 2015-01-01 00:35:00             25.0    0.670
 9 2015-01-01 00:40:00             25.0    0.670
10 2015-01-01 00:45:00             25.0    0.670
11 2015-01-01 00:50:00             25.0    0.670
12 2015-01-01 00:55:00             25.0    0.670
13 2015-01-01 01:00:00             26.0    0.690
Antonios
  • 1,919
  • 1
  • 11
  • 18