0

I have to group by summarize some data in which first column contains years info. I am trying to do it, but getting errors.

Example data is:

mydata = data.frame(Year = c(2001:2018), Dat = c(1:18))

I want to mean aggregate means of "Dat" but group by "Year" divided into groups of 4 (i.e. 2001, 2002, 2003 and 2004 = Group 1 and so forth).

What I am trying:

ggplot(mydata, aes(x=group_by((n=n(Year)/4)), y=Dat)) + stat_summary(fun.y="mean", geom="bar")

But this is throwing error which I am not able to understand.

Error in n(Year) : unused argument (Year)

What I am doing wrong? or is there an alternative to it?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
LeMarque
  • 733
  • 5
  • 21

2 Answers2

1

With dplyr, you can try:

mydata %>%
 group_by(group = gl(ceiling(n() / 4), 4, length = n())) %>%
 summarise(Dat = mean(Dat))

  group   Dat
  <fct> <dbl>
1 1       2.5
2 2       6.5
3 3      10.5
4 4      14.5
5 5      17.5

Just the grouping:

mydata %>%
 group_by(group = gl(ceiling(n() / 4), 4, length = n())) 

    Year   Dat group
   <int> <int> <fct>
 1  2001     1 1    
 2  2002     2 1    
 3  2003     3 1    
 4  2004     4 1    
 5  2005     5 2    
 6  2006     6 2    
 7  2007     7 2    
 8  2008     8 2    
 9  2009     9 3    
10  2010    10 3    
11  2011    11 3    
12  2012    12 3    
13  2013    13 4    
14  2014    14 4    
15  2015    15 4    
16  2016    16 4    
17  2017    17 5    
18  2018    18 5

And to get the graph (borrowing the plotting idea from @Ronak Shah):

mydata %>%
 group_by(group = gl(ceiling(n() / 4), 4, length = n())) %>%
 summarise(Dat = mean(Dat)) %>%
 ggplot(aes(group, Dat)) + 
 geom_bar(stat = "identity")
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1

I would keep the reshaping of data and plotting explicit

library(dplyr)
library(ggplot2)

mydata %>%
   group_by(group = ceiling((1:nrow(mydata)/ 4))) %>%
   summarise(mean = mean(Dat)) %>%
   ggplot() + 
   aes(group, mean) + 
   geom_bar(stat = "identity")

enter image description here


However, using stat_summary you could do

ggplot(mydata) + 
     aes(x = ceiling((1:nrow(mydata))/ 4), y = Dat) + 
     stat_summary(fun.y = "mean",geom = "bar")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213