splitting in samples and operating on them

Question

I am just beginning with R and I have a beginner's question.

I have the following data frame (simplified):

Time: 00:01:00 00:02:00 00:03:00 00:04:00   ....

Flow: 2          4         5      1         ....

I would like to know the mean flow every two minutes instead of every minute. I need this for many hours of data.

I want to save those new means in a list. How can I do this using an apply function?

Please include a large enough sample of your data that we can work with. See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for how to make an R question that folks can recreate. Also please post what you've tried so far. — camille, Apr 27 '18 at 15:17
if your data is always in 1 minute intervals, then you could try the function `rollapply` from the package `zoo`. See the example: `z <- zoo(11:15, as.Date(31:35)); rollapply(z, 2, mean)` . As you are new to R, `install.packages("zoo")` & `library("zoo")` will allow you to use this function — Jonny Phelps, Apr 27 '18 at 15:21

score 0 · Answer 1 · edited Apr 27 '18 at 20:46

You can create a new variable in your data by using rounding your time variable to the closest two minutes below, then use a data table function to calculate the mean for your new minutes.

In order to help you precisely, you're gonna have to point out how your data is set up. If, for instance, your data is set up like this:

dt = data.table(Time = c(0:3), Flow = c(2,4,5,1))

Then the following would work for you:

dt[, twomin := floor(Time/2)*2]
dt[, mean(Flow), by = twomin]

score 0 · Accepted Answer · answered Apr 27 '18 at 17:27

0

I assume you have continuous data without gaps, with values for Flow for every minute.

In base R we can use aggregate:

df.out <- data.frame(Time = df[seq(0, nrow(df) - 1, 2) + 1, "Time"]);
df.out$mean_2min = aggregate(
    df$Flow,
    by = list(rep(seq(1, nrow(df) / 2), each = 2)),
    FUN = mean)[, 2];
df.out;
#      Time mean_2min
#1 00:01:00         3
#2 00:03:00         3

Explanation: Extract only the odd rows from df; aggregate values in column Flow by every 2 rows, and store the mean in column mean_2min.

Sample data

df <- data.frame(
    Time = c("00:01:00", "00:02:00", "00:03:00", "00:04:00"),
    Flow = c(2, 4, 5, 1))

answered Apr 27 '18 at 17:27

Maurits Evers

49,617
4
47
68

Thanks a lot for this answer! I have a follow-up qustion. What do I do if instead of a total even number of data (in this case I have 4), I have an odd one? Then aggregate complains... What I would like to have is the last number – momo Apr 28 '18 at 12:47
Thanks a lot for this answer! I have a follow-up question. What do I do if instead of a total even number of data (in this case I have 4 rows), I have an odd one (like 5)? Then aggregate complains because of the different lengths of the arguments. I am not interested in mean values which are not coming from 2 minutes intervals, so I do not care about row number 5. How do I tell this aggregate? Thanks very much in advance! – momo Apr 28 '18 at 13:09
@momo If it's always the *last* row, and you want to discard it, just remove the row with e.g. `df[-nrow(df), ]`. If your data are *not* continuous in 1 min steps, this becomes a different problem. – Maurits Evers Apr 28 '18 at 13:16
thanks for your answer. My data are continuous in 1 min steps, but I actually want to know the mean every 15 minutes (not every 2 as I wrote for simplification). Therefore sometimes I will have up to 14 rows that I might want to discard. I need somehow to say that I want to run on the maximum number of rows such that nrow/15 is an integer. Sorry if I am not clear. – momo Apr 28 '18 at 13:58
1

thanks very much, it is solved now. I just checked if (nrow(df)/2)%%1!=0 and while that was true, I did df<-df[-nrow(df)]. – momo Apr 28 '18 at 19:26

splitting in samples and operating on them

2 Answers2

Sample data