-3

I'm interested in figuring out how to sum all steps by date as well as how to average all steps by date. The data is formatted where one column shows the steps by different time intervals and another column repeats the date. I'd like to sum and average all steps for each day. See below.

Thanks!

Example

user20650
  • 24,654
  • 5
  • 56
  • 91
Ben
  • 49
  • 7
  • 3
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. A picture of data isn't helpful. Show what code you've tried so far. – MrFlick Feb 21 '18 at 22:22
  • 1
    search terms for "r aggregate by date" brought up several links that may be useful: https://stackoverflow.com/questions/24788450/r-aggregate-data-frame-with-date-column ; https://stackoverflow.com/questions/6052631/aggregate-daily-data-to-month-year-intervals ; https://stackoverflow.com/questions/14641874/summary-of-data-for-each-year-in-r ; https://stackoverflow.com/questions/37575785/r-group-by-date-and-summarize-the-values – user20650 Feb 21 '18 at 22:40

1 Answers1

1

The ddply function from plyr always does a good job of this.

sumFrame <- plyr::ddply(df, "date", numcolwise(sum))
meanFrame <- plyr::ddply(df, "date", numcolwise(mean))

The first argument is the name of your data frame.

The second argument is the column it should group by - in this case it's date, but you can also give it a column vector with multiple columns names, e.g. c("date", "time").

The final argument takes what function you want to apply, in this case sum and mean. The numcolwise bit is just to make sure the function applies this to the column, not a row.

As another note, as MrFlick said, you should be providing a reproducible example and some solutions you've tried so far.

LachlanO
  • 1,152
  • 8
  • 14
  • Thanks for your response. I tried new_df <- plyr::ddply(df, "date", numcolwise(sum)) and new_df <- plyr::ddply(df, c("date", "hour"), numcolwise(sum)) and both of these just gave a column with NAs. – Ben Feb 25 '18 at 21:38
  • Thanks for your response. I tried new_df <- plyr::ddply(df, "date", numcolwise(sum)) and new_df <- plyr::ddply(df, c("date", "hour"), numcolwise(sum)) and both of these just gave a column with NAs. Is there a way to specify which specific column you want to sum, rather than summing all other columns, which may not work as they might be string format? – Ben Feb 25 '18 at 21:55
  • The best bet would be to give example data when asking a question so it can be reproduced. The advice I can give you though, is if you want to set which columns you want to use, limit the input. So rather than using `df` as your first argument, use something like `df[,c("date", "hour")]` so you only give the function those two columns to work with. – LachlanO Feb 26 '18 at 02:00