1

I want to get the mean of a series columns grouping by station but keeping it separated by year.

My data looks something like this

+---------+------+-------+------+------+------+
| station | year | month | tmax | tmin | rain |
+---------+------+-------+------+------+------+
| A       | 2006 |     1 |   15 |   15 | NA   |
| A       | 2006 |     2 |   25 |   16 | 4    |
| A       | 2006 |     3 |   30 |   18 | 7    |
| A       | 2006 |     4 |   40 |   18 | 5    |
| A       | 2007 |     1 |   15 |   15 | 6    |
| A       | 2007 |     2 |   25 |   16 | 8    |
| A       | 2007 |     3 |   30 |   18 | 10   |
| A       | 2007 |     4 |   40 |   18 | 3    |
| A       | 2008 |     1 |   15 |   15 | 5    |
| A       | 2008 |     2 |   25 |   16 | 8    |
| A       | 2008 |     3 |   30 |   18 | 1    |
| A       | 2008 |     4 |   40 |   18 | 3    |
| B       | 2006 |     1 |   15 |   15 | NA   |
| B       | 2006 |     2 |   25 |   16 | 4    |
| B       | 2006 |     3 |   30 |   18 | 7    |
| B       | 2006 |     4 |   40 |   18 | 5    |
| B       | 2007 |     1 |   15 |   15 | 6    |
| B       | 2007 |     2 |   25 |   16 |      |
| B       | 2007 |     3 |   30 |   18 |      |
| B       | 2007 |     4 |   40 |   18 |      |
| B       | 2008 |       |      |      |      |
+---------+------+-------+------+------+------+

I've tried this, but I feel like I'm completely missing the point and it doesn't get me the output I want

t <- NewData %>% group_by(station) %>%
  summarise_at(vars(-station, -year), funs(mean(., na.rm=TRUE)))

I want to get something like this as an output

+---------+------+------+------+------+
| station | year | tmax | tmin | rain |
+---------+------+------+------+------+
| A       | 2006 |   15 |   15 | NA   |
| A       | 2007 |   25 |   16 | 4    |
| A       | 2008 |   30 |   18 | 7    |
| B       | 2006 |   40 |   18 | 5    |
| B       | 2007 |   15 |   15 | 6    |
| B       | 2008 |   25 |   16 | 8    |
| C       | 2006 |   30 |   18 | 10   |
| C       | 2007 |   40 |   18 | 3    |
| C       | 2008 |   15 |   15 | 5    |
+---------+------+------+------+------+

Thanks!

Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • It is helpful if you post your data based on output generated by `dput()` , so others can easily load your data into R. – Wimpel Feb 08 '19 at 09:59
  • Thanks Wimpel, I didn't know about dput() and wasn't sure how to do it! – JaneBirkin Feb 08 '19 at 10:01
  • this works for me on data looking similar ot yours NewData %>% group_by(station, year) %>% summarise(tmax = mean(tmax, na.rm = TRUE), tmin = mean(tmin, na.rm = TRUE), rain = mean(rain, na.rm = TRUE)) – Chris Littler Feb 08 '19 at 10:06
  • @ChrisLittler It worked! Thank you so much. I pretty much had it, but I had tried group_by(station) %>% group_by(year) %>% as separate function, and was using summarise_at(), which I think was somehow was messing with how it grouped the data (I still can't get my head around the logic of dplyr most of the time). Cheers! – JaneBirkin Feb 08 '19 at 10:17

0 Answers0