Averaging one column every 15 rows

Question

I have a dataset (datatable) with three columns:

date
time
price

The dataset is in one minute intervals and I need to convert this to fifteen minute intervals. To do this I want the average of the price per 15 rows. I need this in a new dataframe (of 15 times less rows than original) so I can copy/paste this in another dataset where there are already 15 minute intervals.

I tried to create a new list by getting the mean of every 15 rows with the below code:

means.price <- dt.Energy.prices[, mean(dt.Energy.prices$Lowest_price_downward), by= 
     (seq(nrow(dt.Energy.prices)) - 1) %/% 15]

I get a new dataset where all the means are NA. Means.price has around 70.000 obs. where original had around 1.05 million. It worked out to create a dataset which is 1/15 of original but the means are still missing.

Sometime the column price has an NA. So sometimes all 15 are NA, but sometimes 5 rows have a value where the other 10 have NA. If it are 15, I'm ok with output mean NA but when there are some values known, I hope there is a way to average the known values.

SO is not intended for "please write this for me", though it happens occasionally. This site is intended to be about "this code *here* does not work, can you help me identify/fix the bug". This question needs to following some principles of [reproducible questions](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example); I suggest you read that link as well as [minimal/verifiable questions](http://stackoverflow.com/help/mcve), then come back and edit your question to include *sample data* and your relevant code that is not working. — r2evans, Mar 01 '17 at 14:42
Are you looking for this `sapply(split(x, ceiling(seq_along(x)/15)), mean) `. — Chirayu Chamoli, Mar 01 '17 at 15:45
Thanks Chirayu. I think this works. Only that I get an NA when one of the fifteen values is a NA. Where do I fit na.rm = true in your line of code? — Max, Mar 01 '17 at 15:55

score 0 · Answer 1 · answered Mar 01 '17 at 15:33

One easy way to do this while avoiding conversion to a time series (though this may be beneficial for further work on this data) is to us na.rm = TRUE in your call to mean().

Here's an example of how this works:

> mean(c(2, 0, 0, 0))
[1] 0.5
> mean(c(2, 0, 0, NA))
[1] NA
> mean(c(2, 0, 0, NA), na.rm = TRUE)
[1] 0.6666667

From your code, it isn't very clear what Lowest_price_downward is, but from the way you are describing it, this is likely to help.

means.price <- dt.Energy.prices[, mean(dt.Energy.prices$Lowest_price_downward,
                                       na.rm = TRUE),
                                  by = (seq(nrow(dt.Energy.prices)) - 1) %/% 15]`

Hmm.. All 70.000 obs have 19.26 as a mean. I think R calculated the mean of the entire column for each instance instead of the mean of every 15 rows.. — Max, Mar 01 '17 at 15:51
Found out how to work with sapply with above help. You thanks for your effort as well! — Max, Mar 01 '17 at 16:00

Averaging one column every 15 rows

1 Answers1