Calculations by Subgroup in a Column

Question

I have a dataset that looks approximately like this:

> dataSet
   month detrend
1    Jan  315.71
2    Jan  317.45
3    Jan   317.5
4    Jan   317.1
5    Jan  315.71
6    Feb  317.45
7    Feb   313.5
8    Feb   317.1
9    Feb  314.37
10   Feb  315.41
11 March  316.44
12 March  315.73
13 March  318.73
14 March  315.55
15 March  312.64
.
.
.

How do I compute the average by month? E.g., I want something like

> by_month
   month ave_detrend
1    Jan  315.71
2    Feb  317.45
3  March   317.5

Please provide an actual "run-able" example. Also, it sounds like you know a good amount of statistics; please explain what you mean by "de-trend". Are you saying that the "detrended" data represents the residuals from the linear fit? Thanks. — Mike Williamson, Dec 07 '17 at 05:22
Hi @K.M.M I severely edited your question so that it was much easier for future readers (and helpers) to read. It's useful to know how to present questions on this board, to ensure that you get better answers faster. :) — Mike Williamson, Dec 07 '17 at 06:08

Mike Williamson · Answer 1 · 2017-12-13T22:55:03.457

What you need to focus on is a means to group your column of interest (the "detrend") by the month. There are ways to do this within "vanilla R", but the most effective way is to use tidyverse's dplyr.

I will use the example taken directly from that page:

mtcars %>%
  group_by(cyl) %>%
  summarise(disp = mean(disp), sd = sd(disp))

In your case, that would be:

by_month <- dataSet %>%
  group_by(month) %>%
  summarize(avg = mean(detrend))

This new "tidyverse" style looks quite different, and you seem quite new, so I'll explain what's happening (sorry if this is overly obvious):

First, we are grabbing the dataframe, which I'm calling dataSet.
Then we are piping that dataset to our next function, which is group_by. Piping means that we're putting the results of the last command (which in this case is just the dataframe dataSet) and using it as the first parameter of our next function. The function group_by has a dataframe provided as its first function.
Then the results of that group by are piped to the next function, which is summarize (or summarise if you're from down under, as the author is). summarize simply calculates using all the data in the column, however, the group_by function creates partitions in that column. So we now have the mean calculated for each partition that we've made, which is month.
- This is the key: group_by creates "flags" so that summarize calculates the function (mean, in this case) separately on each group. So, for instance, all of the Jan values are grouped together and then the mean is calculated only on them. Then for all of the Feb values, the mean is calculated, etc.

HTH!!

Jesse · Answer 2 · 2017-12-07T07:10:13.283

0

R has an inbuilt mean function: mean(x, trim = 0, na.rm = FALSE, ...)

I would do something like this:

january <- dataset[dataset[, "month"] == "january",]
januaryVector <- january[, "detrend"]
januaryAVG <- mean(januaryVector)

edited Dec 07 '17 at 07:10

answered Dec 07 '17 at 05:23

Jesse

283
1
5
14

I understand but I am unsure how to incorporate that into the function I need (just made a small edit in the question). I tried "if(Month==3) {mean(detrend)}", but it gives me an error message saying "In if (Month == 3) { : the condition has length > 1 and only the first element will be used" – K.M.M Dec 07 '17 at 05:29
@K.M.M detrend is not seperated into months so you will need to do that. also you need to assign a variable. `Average <- mean(new_detrend)` – Jesse Dec 07 '17 at 05:42
> January_detrend <- Month==3 > head(January_detrend) [1] TRUE FALSE FALSE FALSE FALSE FALSE > Average <- mean(detrend, January_detrend) Error in mean.default(detrend, January_detrend) : 'trim' must be numeric of length one – K.M.M Dec 07 '17 at 05:46
Ok I tried to do that but still get an error (see below, not above), but still get an error. – K.M.M Dec 07 '17 at 05:47
> January_detrend <- Month==3 > head(January_detrend) [1] TRUE FALSE FALSE FALSE FALSE FALSE > Average <- mean(detrend, January_detrend) Error in mean.default(detrend, January_detrend) : 'trim' must be numeric of length one – K.M.M Dec 07 '17 at 05:48
I don't think you read that right. I did January_detrend<-Month==3, and then head(January_detrend). It returns a vector showing which months are January. Now I need to take the mean of those months , or the "TRUE" months – K.M.M Dec 07 '17 at 05:55
@K.M.M Ill just update answer for how i would go about it – Jesse Dec 07 '17 at 05:57

Calculations by Subgroup in a Column

2 Answers2