2

I'm new in R and I am trying to use ggplot to create subsets of bar graph per id all together. Each bar must represent the sum of the values in d column by month-year (which is c column). d has NA values and numeric values as well.

My dataframe, df, is something like this, but it has actually around 10000 rows:

#Example of my data
a=c(1,1,1,1,1,1,1,1,3)
b=c("2007-12-03", "2007-12-10", "2007-12-17", "2007-12-24", "2008-01-07", "2008-01-14", "2008-01-21", "2008-01-28","2008-02-04")
c=c(format(b,"%m-%Y")[1:9])
d=c(NA,NA,NA,NA,NA,4.80, 0.00, 5.04, 3.84)
df=data.frame(a,b,c,d)
df

  a          b       c    d
1 1 2007-12-03 12-2007   NA
2 1 2007-12-10 12-2007   NA
3 1 2007-12-17 12-2007   NA
4 1 2007-12-24 12-2007   NA
5 1 2008-01-07 01-2008   NA
6 1 2008-01-14 01-2008 4.80
7 1 2008-01-21 01-2008 0.00
8 1 2008-01-28 01-2008 5.04
9 3 2008-02-04 02-2008 3.84

I tried to do my graph using this:

mplot<-ggplot(df,aes(y=d,x=c))+
       geom_bar()+
       theme(axis.text.x = element_text(angle=90, vjust=0.5))+
       facet_wrap(~ a)

I read from the help of geom_bar():

"geom_bar uses stat_count by default: it counts the number of cases at each x position"

So, I thought it would work like that by I'm having this error:

Error: stat_count() must not be used with a y aesthetic.

For the sample I'm providing, I would like to have the graph for id 1 that shows the months with NA empty and the 01-2008 with 9.84. Then for the second id, I would like to have again the months with NA empty and 02-2008 with 3.84.

I'm also tried to sum the data per month by using aggregate and sum before to plot and then use identity in the stat parameter of geom_bar, but, I'm getting NA in some months and I don't know the reason.

I really aprreciate your help.

Jan
  • 3,825
  • 3
  • 31
  • 51
Giecod
  • 23
  • 1
  • 6
  • Can you provide some of your data as described [here](https://stackoverflow.com/a/5963610/2582968)? How should the final graph look like? to you want it to have one bar for 01-2008 with hight of 9.84 and a second bar for 02-2008 with 3.84 (based on the sample data)? – Jan Jun 29 '17 at 04:51
  • Exactly @Jan. I just edit the post with the expected results – Giecod Jun 29 '17 at 14:44
  • Like so: `ggplot(df, aes(y=d, x=c)) + geom_col() + theme(axis.text.x = element_text(angle=90, vjust=0.5))+facet_wrap(~ a)` – Jan Jun 29 '17 at 18:53
  • Thank you @Jan. It worked! My problem now is that I don't know why. According to the help, if I want to count, I use geom_bar and if I want to use the values in the data I use geom_col. So, do you know why it is working? Also, I added a reorder to the x-axis since I was getting 01-2007, 01-2008, 02-2007,... instead of 01-2007, 02-2007,... So far, my code looks like: `mplot<-ggplot(e, aes(y=d,x=reorder(format(as.Date(e$b),'%m-%Y'),e$b)))+ geom_col()+ theme(axis.text.x = element_text(angle=90, vjust=0.5))+ xlab("Months")+ facet_wrap(~ a) ` – Giecod Jun 29 '17 at 20:11
  • I posted a bit as answer, would be great if you could accept it as answer. for your question "do you know why it is working" I am not sure what you want to know. it work's because you now use it as intended ;) – Jan Jun 30 '17 at 04:02

3 Answers3

1

Do you want something like this:

mplot = ggplot(df, aes(x = b, y = d))+
  geom_bar(stat = "identity", position = "dodge")+
  facet_wrap(~ a)

mplot

enter image description here

I am using x = b instead of x = c for now.

AK88
  • 2,946
  • 2
  • 12
  • 31
1

You should use geom_col not geom_bar. See the help text:

There are two types of bar charts: geom_bar makes the height of the bar proportional to the number of cases in each group (or if the weight aethetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col instead. geom_bar uses stat_count by default: it counts the number of cases at each x position. geom_col uses stat_identity: it leaves the data as is.

So your final line of code should be:

ggplot(df, aes(y=d, x=c)) + geom_col() + theme(axis.text.x = element_text(angle=90, vjust=0.5))+facet_wrap(~ a)
Jan
  • 3,825
  • 3
  • 31
  • 51
0

No need to use geom_col as suggested by @Jan. Simply use the weight aesthetic instead:

ggplot(iris, aes(Species, weight=Sepal.Width)) + geom_bar() + ggtitle("summed sepal width")
Holger Brandl
  • 10,634
  • 3
  • 64
  • 63