how to do resampling of dataframes

Question

data frame is here

       time            value
0   01-01-2015 00:00    72
1   01-01-2015 01:00    74
2   01-01-2015 02:00    75
3   01-01-2015 03:00    77
4   01-01-2015 06:00    72

if i pass this dataframe in Pandas it will give me 24 entries and missing hours has zero in output(values) (this is also what i want)

syntax

resample_factor="H"

data_frame = data_frame.resample(resample_factor).mean()

first of all here are some link which was not helpful

here is second

can we do this with R ??

please suggest me how can we do that, if it is possible!!

You can do this in R : https://stackoverflow.com/a/31484550/10580543 if what you want if to fill missing hours — tom, Sep 05 '19 at 12:42

Ronak Shah · Answer 1 · 2019-09-05T13:10:06.883

Maybe you are looking for tidyr::complete to complete missing hours. This creates hourly sequence of 24 hours starting from first value of time.

library(dplyr)

df %>%
  mutate(V2 = as.POSIXct(V2, format = "%d-%m-%Y %H:%M")) %>%
  arrange(V2) %>%
  tidyr::complete(V2 = seq(first(V2), first(V2) + 86400 - (60 * 60),by = "1 hour"), 
                 fill = list(V1 = 0, V3 = 0))


#   V2                     V1    V3
#   <dttm>              <dbl> <dbl>
# 1 2015-01-01 00:00:00     0    72
# 2 2015-01-01 01:00:00     1    74
# 3 2015-01-01 02:00:00     2    75
# 4 2015-01-01 03:00:00     3    77
# 5 2015-01-01 04:00:00     0     0
# 6 2015-01-01 05:00:00     0     0
# 7 2015-01-01 06:00:00     4    72
# 8 2015-01-01 07:00:00     0     0
# 9 2015-01-01 08:00:00     0     0
#10 2015-01-01 09:00:00     0     0
# … with 14 more rows

If the time doesn't start at 00:00, we can extract the date from date-time and create a sequence of 24 hours.

df %>%
  mutate(V2 = as.POSIXct(V2, format = "%d-%m-%Y %H:%M", tz = "GMT")) %>%
  tidyr::complete(V2 = seq(as.POSIXct(as.Date(first(V2))),by = "1 hour", 
 length.out = 24), fill = list(V1 = 0, V3 = 0))

data

df <- structure(list(V1 = 0:4, V2 = structure(1:5, .Label = c("01-01-201500:00", 
"01-01-201501:00", "01-01-201502:00", "01-01-201503:00", "01-01-201506:00"
), class = "factor"), V3 = c(72L, 74L, 75L, 77L, 72L)), class = 
"data.frame", row.names = c(NA, -5L))

score 1 · Accepted Answer · answered Sep 05 '19 at 12:55

1

Here is a base R idea,

dates1 <- seq(as.POSIXct(dd$V2[1], format = '%d-%m-%Y 00:00'), 
              as.POSIXct(dd$V2[1], format = '%d-%m-%Y 00:00') + 82800, 
          by = '1 hour')

merge(transform(dd, V2 = as.POSIXct(V2, format = '%d-%m-%Y %H:%M')),
      data.frame(V2 = dates1), 
      by = 'V2', all = TRUE)

which gives,

                    V2 V1 V3
1  2015-01-01 00:00:00  0 72
2  2015-01-01 01:00:00  1 74
3  2015-01-01 02:00:00  2 75
4  2015-01-01 03:00:00  3 77
5  2015-01-01 04:00:00 NA NA
6  2015-01-01 05:00:00 NA NA
7  2015-01-01 06:00:00  4 72
8  2015-01-01 07:00:00 NA NA
9  2015-01-01 08:00:00 NA NA
10 2015-01-01 09:00:00 NA NA
11 2015-01-01 10:00:00 NA NA
12 2015-01-01 11:00:00 NA NA
13 2015-01-01 12:00:00 NA NA
14 2015-01-01 13:00:00 NA NA
15 2015-01-01 14:00:00 NA NA
16 2015-01-01 15:00:00 NA NA
17 2015-01-01 16:00:00 NA NA
18 2015-01-01 17:00:00 NA NA
19 2015-01-01 18:00:00 NA NA
20 2015-01-01 19:00:00 NA NA
21 2015-01-01 20:00:00 NA NA
22 2015-01-01 21:00:00 NA NA
23 2015-01-01 22:00:00 NA NA
24 2015-01-01 23:00:00 NA NA

NOTE: You can replace NA as per usual

DATA

dput(dd)
structure(list(V1 = 0:4, V2 = c("01-01-2015 00:00", "01-01-2015 01:00", 
"01-01-2015 02:00", "01-01-2015 03:00", "01-01-2015 06:00"), 
    V3 = c(72L, 74L, 75L, 77L, 72L)), row.names = c(NA, -5L), class = "data.frame")

answered Sep 05 '19 at 12:55

Sotos

51,121
6
32
66

this gives me expected out , just let me do in my console !! – jony Sep 05 '19 at 13:01
one more question , does it work for "daily ", "monthly" as well? – jony Sep 05 '19 at 13:02
Yes, you can do daily, monthly, every n minutes...etc – Sotos Sep 05 '19 at 13:15
please tell me HOW!! – jony Sep 05 '19 at 13:31
Just change the `by` argument in `dates1`. Try `seq(..., ..., by = '5 mins')` – Sotos Sep 05 '19 at 13:33
Thanks , is there any alter-net way to do this !! , however this ans is perfect – jony Sep 05 '19 at 13:35
1

See more info about `by` [here](https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/seq.Date) – Sotos Sep 05 '19 at 13:36
1

The concept is the same...Create the sequence and merge. However, different packages have different functions. For example Ronak's answer uses `complete` from `tidyr` package. I chose base R as you don't have to load additional libraries... – Sotos Sep 05 '19 at 13:37
HEY !! man , if i convert this in Day , it should give me one entry ! i am not able to that with this , please help me , in python it works `data_frame.resample(resample_factor).mean()` whit this syntax ... `R_facto="H"` – jony Sep 05 '19 at 15:37
1

Just change the `by` argument to `1 day` and it should work – Sotos Sep 06 '19 at 06:18

how to do resampling of dataframes

2 Answers2