0

I get some data from different sensors including a timestamp every minute. For visualizing my data I would like to summarize every 10 values with mean.

My data looks like this:

 Temp Humidity Pressure           Time
1          21.9             66.1   1007.76 2017-07-24 18:13:02
2          21.9             66.2   1007.76 2017-07-24 18:14:05
3          21.9             66.2   1007.76 2017-07-24 18:15:02
4          22.0             65.8   1007.76 2017-07-24 18:16:02
5          22.0             66.1   1007.76 2017-07-24 18:17:02
6          22.0             66.2   1007.76 2017-07-24 18:18:02
7          22.0             66.1   1007.76 2017-07-24 18:19:02
8          22.0             66.3   1007.76 2017-07-24 18:20:02
9          22.0             66.3   1007.76 2017-07-24 18:21:02
10         22.0             66.3   1007.76 2017-07-24 18:22:02
11         22.0             66.0   1007.76 2017-07-24 18:23:02


# [...] about 1700 rows

I have a working code, but only for the numeric cols:

aggregate(df,list(rep(1:(nrow(df)%/%n+1),each=n,len=nrow(df))),mean)[-1];

That gives me what I want for the first three columns, which are numeric vectors. But for the time, which is of POSIXlt all I get is "2017-07-24 18:17:32" in every row. Does anyone know a solution for this? It wouldn't be a problem if I had to take the means of Time seperately.

psalterium
  • 65
  • 4
  • 3
    use `dput(head(df,10))` to present your data instead of the table that you have. – M-- Jul 27 '17 at 17:48
  • ...or `dput(droplevels(head(df, 11)))` if you have factor columns. – Gregor Thomas Jul 27 '17 at 17:57
  • IF the class of data is `POSIXt` then it should work. Without seeing the actual structure of your data we cannot help you. – M-- Jul 27 '17 at 18:01
  • @psalterium not sure what works exactly, but mostly because if you have characters you cannot get the mean of them. They should be class of `POSIXt` and `POSIXct`. – M-- Jul 27 '17 at 20:03
  • @Masoud I just put the time column at the first place, so the data now has the order "time, temp, humidity, pressure". I did this because I misread your answer. To my own surprise it now works perfectly. As written in my question, 'time' was of POSIXlt all the time. I just wonder why it behaves like this. – psalterium Jul 27 '17 at 20:09
  • Again, I, or anyone else who wants to help you, need to see the actual structure of your data. `dput` does that. Provide `dput` for the original form and the second form of your data (the one you say that it now works) so we can consult you further. Unfortunately, I am not a guru in R that knows every possible scenario off the top of my head. Cheers. – M-- Jul 27 '17 at 20:35
  • @Masoud I really didn't want to offend you, just ask for the details you need to know - I didn't know `dput` before. However, here it is: https://pastebin.com/SKsW0CXA – psalterium Jul 27 '17 at 21:17
  • @psalterium you really did not offend me. I asked for 10 rows of your data.frame not all of it. I should've pointed you to this thread. My bad. Please read [How to make a great reproducible example in R?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – M-- Jul 27 '17 at 21:24
  • @Masoud https://pastebin.com/eYr3f0aE – psalterium Jul 27 '17 at 21:35
  • @psalterium They are not the same. First one is `POSIXlt` and second one is `POSIXct` which is the correct one. That's why it works. – M-- Jul 27 '17 at 21:46
  • @Masoud I just created a new data frame with `new <- data.frame(time = "df$time", temp = "df$temp ...` so I didn't change the format actively. But good to know it works like that. Thanks for your help – psalterium Jul 27 '17 at 21:57

1 Answers1

0

You can make a column group and group_by this one and summarize all columns to get the mean every 10 rows:

n <- nrow(df)
df$group <- rep(seq_len(ceiling(n / 10)), each = 10)[seq_len(n)]

library(dplyr)
df %>%
  group_by(group) %>%
  summarise_all(mean) %>%
  select(-group)
F. Privé
  • 11,423
  • 2
  • 27
  • 78