10

This is the closest link I've found: https://stats.stackexchange.com/questions/5305/how-to-re-sample-an-xts-time-series-in-r

But I don't see anything about the different ways to aggregate the data (like mean, count, anonymous function) which you can do in pandas.

For my program, I'm trying to have a dataframe be resampled every 2 minutes and take the average of the 2 values at each interval. Thanks!

Community
  • 1
  • 1
Alex Petralia
  • 1,730
  • 1
  • 22
  • 39

4 Answers4

4

If you use data.table and lubridate it might look something like this

library(data.table)
library(lubridate)
#sample data
dt<-data.table(ts=seq(from=ymd('2015-01-01'), to=ymd('2015-07-01'),by='mins'), datum=runif(260641,0,100))

if you wanted to get the data from minute to hourly means you could do

 dt[,mean(datum),by=floor_date(ts,"hour")]

if you had a bunch of columns and you wanted all of them to be averaged you could do

dt[,lapply(.SD,mean),by=floor_date(ts,"hour")]

You can replace mean for any function you'd like. You can replace "hour" with "second", "minute", "hour", "day", "week", "month", "year". Well you can't go from minute to seconds as that would require magic but you can go from micro seconds to seconds anyway.

It is not possible to convert a series from a lower periodicity to a higher periodicity - e.g. weekly to daily or daily to 5 minute bars, as that would require magic.

-Jeffrey Ryan from xts manual.

I never learned xts so I don't know the syntax to do it with xts objects but that line is famous (or at least as famous as a line from a manual can be)

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72
  • This looks pretty close - is there anything to have it take the average every 5 minutes (in pandas, this is simply doing "5 min" in an argument). In other words, take the average of each of the 5 minutes and produce a time series like 12:30pm, 12:35pm, 12:40pm, etc.? This is to smooth the data. – Alex Petralia Jul 26 '15 at 20:44
  • There's not a built in way to do it that I know about but you could do something like `dt[,lapply(.SD,mean),by=minutes(floor(as.numeric(difftime(ts,ymd('1970-01-01'),units="mins"))/5)*5)+ymd('1970-01-01')]` where you find the number of minutes since some arbitrary date divide by whatever range you want, take the floor of that, multiply that by the same thing you divided by and add it to the same arbitrary date. – Dean MacGregor Jul 26 '15 at 22:43
  • @AlexPetralia Does this work for your needs? If so, would you mind accepting the answer please? – Dean MacGregor Jul 29 '15 at 21:30
3

I found this topic looking for a R equivalent for pandas resample() but for xts object. I post a solution just in case, for a time delta of five minutes where ts is an xts object:

period.apply(ts, endpoints(ts, k=5, "minutes"), mean)
Simon
  • 191
  • 2
  • 9
2

You could use reticulate to utilize pandas methods

require(reticulate)
pd <- import("pandas")

df <- r_to_py(df) #Transform to Pandas DataFrame
df = df$set_index(pd$DatetimeIndex(df['Date']))
#df_meidan_hours=df$resample('1H', how='median', closed='left', label='left')
df_meidan_hours=df$resample('1H',closed='left', label='left')$agg('median')
df_meidan_hours <- py_to_r(df_meidan_hours) #Transform back to r's data.frame
0

Have you looked into the R COIN package? Here is a tutorial that might help you figure out if this is what you are looking for: http://www.statmethods.net/stats/resampling.html

More information on the package can be found here: https://cran.r-project.org/web/packages/coin/coin.pdf

Jay
  • 442
  • 1
  • 5
  • 13
  • I'm not sure this is the same "resampling" as python pandas uses it. This is statistical resampling and I don't think the pandas "resample()" method has anything to do with that - it just happens to have the same name. – Alex Petralia Jul 26 '15 at 20:45