0

I have a dataset (datatable) with three columns:

  • date
  • time
  • price

The dataset is in one minute intervals and I need to convert this to fifteen minute intervals. To do this I want the average of the price per 15 rows. I need this in a new dataframe (of 15 times less rows than original) so I can copy/paste this in another dataset where there are already 15 minute intervals.

I tried to create a new list by getting the mean of every 15 rows with the below code:

means.price <- dt.Energy.prices[, mean(dt.Energy.prices$Lowest_price_downward), by= 
     (seq(nrow(dt.Energy.prices)) - 1) %/% 15]

I get a new dataset where all the means are NA. Means.price has around 70.000 obs. where original had around 1.05 million. It worked out to create a dataset which is 1/15 of original but the means are still missing.

Sometime the column price has an NA. So sometimes all 15 are NA, but sometimes 5 rows have a value where the other 10 have NA. If it are 15, I'm ok with output mean NA but when there are some values known, I hope there is a way to average the known values.

lmo
  • 37,904
  • 9
  • 56
  • 69
Max
  • 31
  • 1
  • 6
  • 1
    SO is not intended for "please write this for me", though it happens occasionally. This site is intended to be about "this code *here* does not work, can you help me identify/fix the bug". This question needs to following some principles of [reproducible questions](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example); I suggest you read that link as well as [minimal/verifiable questions](http://stackoverflow.com/help/mcve), then come back and edit your question to include *sample data* and your relevant code that is not working. – r2evans Mar 01 '17 at 14:42
  • Ok, thanks! First time using the website! – Max Mar 01 '17 at 14:43
  • I edited my comment to explain what I've tried so far! – Max Mar 01 '17 at 14:59
  • 2
    Are you looking for this `sapply(split(x, ceiling(seq_along(x)/15)), mean) `. – Chirayu Chamoli Mar 01 '17 at 15:45
  • Thanks Chirayu. I think this works. Only that I get an NA when one of the fifteen values is a NA. Where do I fit na.rm = true in your line of code? – Max Mar 01 '17 at 15:55
  • Just after mean I found out! Works. Thanks! – Max Mar 01 '17 at 16:00

1 Answers1

0

One easy way to do this while avoiding conversion to a time series (though this may be beneficial for further work on this data) is to us na.rm = TRUE in your call to mean().

Here's an example of how this works:

> mean(c(2, 0, 0, 0))
[1] 0.5
> mean(c(2, 0, 0, NA))
[1] NA
> mean(c(2, 0, 0, NA), na.rm = TRUE)
[1] 0.6666667

From your code, it isn't very clear what Lowest_price_downward is, but from the way you are describing it, this is likely to help.

means.price <- dt.Energy.prices[, mean(dt.Energy.prices$Lowest_price_downward,
                                       na.rm = TRUE),
                                  by = (seq(nrow(dt.Energy.prices)) - 1) %/% 15]`
Alex Firsov
  • 168
  • 11
  • Hmm.. All 70.000 obs have 19.26 as a mean. I think R calculated the mean of the entire column for each instance instead of the mean of every 15 rows.. – Max Mar 01 '17 at 15:51
  • Found out how to work with sapply with above help. You thanks for your effort as well! – Max Mar 01 '17 at 16:00