2

i have stock price data and my data frame looked like this:

Time                   Price
2018-02-21 09:00:00am 122.12
2018-02-21 09:07:38am 122.43
2018-02-21 09:09:10am 122.44
2018-02-21 09:09:10am 122.45
2018-02-21 09:09:21am 122.26
2018-02-21 09:13:16am 122.37
....

In order to aggregate and make a sliding frame for this time series data, i need to replace each missing minute with price == 000.00, therefore i want the output like this

Time                   Price
2018-02-21 09:00:00am 122.12
2018-02-21 09:01:00am 000.00
2018-02-21 09:02:00am 000.00
2018-02-21 09:03:00am 000.00
2018-02-21 09:04:00am 000.00
........
2018-02-21 09:07:38am 122.43
2018-02-21 09:08:00am 000.00
2018-02-21 09:09:10am 122.44
2018-02-21 09:09:10am 122.45
2018-02-21 09:09:21am 122.26
2018-02-21 09:13:16am 122.37
....

i tried, but it stuck

Nisarg
  • 1,631
  • 6
  • 19
  • 31
  • 1
    not sure but check out the `zoo` anyways when working with time series. – Andre Elrico Aug 14 '18 at 12:46
  • 1
    The package `imputeTS` might be of help in order to replace missing values. – burton030 Aug 14 '18 at 12:51
  • 1
    You could easily build a dataframe with the 0 values at every minute and `merge` it with your initial data, making sure that your data overwrites the former dataframe values (when the data point is on a full minute). – lokxs Aug 14 '18 at 14:00
  • 1
    In base R, you can use `diff` to get the time differences between observations. Then for all instances where this difference is more than a minute, you can grab the starting and ending times, and use `seq.POSIXct` to create "dummy" timestamps for which you can then set `Price := 0` – Gautam Aug 14 '18 at 15:02
  • This might be helpful: https://stackoverflow.com/questions/16787038/insert-rows-for-missing-dates-times – sbha Aug 15 '18 at 00:41

1 Answers1

0

There is probably a better way, but this could be done with dplyr:

library(dplyr)

Assume the original data frame is called df_missing and make sure the date time column is not stored as a string or factor. Also create a second column that rounds the seconds to zero. This is only for merging with the complete time series data frame in the next step. The original times with seconds are preserved:

df_missing <- df_missing %>% 
  mutate(Time = as.POSIXct(Time, format="%Y-%m-%d %H:%M:%S")) %>% 
  mutate(time_merge = as.POSIXct(round(Time, "mins")))

Create a data frame with a complete sequence from a starting time to an end time by minutes. In this example, I'm assuming the time range is the earliest time in your orignal data to the latest, but the start and end could also be set manually:

df_complete <- data_frame(time_merge = seq.POSIXt(min(df_missing$Time), max(df_missing$Time), by="min"))

Then join the complete data frame with the missing to determine which times have a price and then fill in the missing values with zeros :

df_complete <- 
df_complete %>% 
  left_join(df_missing, by = 'time_merge') %>% 
  mutate(Time = if_else(is.na(Time), time_merge, Time)) %>% 
  mutate(Price = if_else(is.na(Price), 0, Price)) %>% 
  select(-time_merge)

>df_complete
# A tibble: 16 x 2
   Time                Price
   <dttm>              <dbl>
 1 2018-02-21 09:00:00  122.
 2 2018-02-21 09:01:00    0 
 3 2018-02-21 09:02:00    0 
 4 2018-02-21 09:03:00    0 
 5 2018-02-21 09:04:00    0 
 6 2018-02-21 09:05:00    0 
 7 2018-02-21 09:06:00    0 
 8 2018-02-21 09:07:00    0 
 9 2018-02-21 09:07:38  122.
10 2018-02-21 09:09:10  122.
11 2018-02-21 09:09:10  122.
12 2018-02-21 09:09:21  122.
13 2018-02-21 09:10:00    0 
14 2018-02-21 09:11:00    0 
15 2018-02-21 09:12:00    0 
16 2018-02-21 09:13:16  122.

# sample data
df_missing <- structure(list(Time = c("2018-02-21 09:00:00am", "2018-02-21 09:07:38am", 
                                      "2018-02-21 09:09:10am", "2018-02-21 09:09:10am", "2018-02-21 09:09:21am", 
                                      "2018-02-21 09:13:16am"), Price = c(122.12, 122.43, 122.44, 122.45, 
                                                                          122.26, 122.37)), class = "data.frame", row.names = c(NA, -6L
                                                                          ))
sbha
  • 9,802
  • 2
  • 74
  • 62