0

I have the following data-frame, where variable is 10 different genre categories of movies, eg. drama, comedy etc.

    > head(grossGenreMonthLong)
       Gross ReleasedMonth variable value
5   33508485             2    drama     1
6   67192859             2    drama     1
8      37865             4    drama     1
9   76665507             1    drama     1
10 221594911             2    drama     1
12    446438             2    drama     1

Reproducible dataframe:

 dput(head(grossGenreMonthLong))
structure(list(Gross = c(33508485, 67192859, 37865, 76665507, 
221594911, 446438), ReleasedMonth = c(2, 2, 4, 1, 2, 2), variable = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("drama", "comedy", "short", "romance", 
"action", "crime", "thriller", "documentary", "adventure", "animation"
), class = "factor"), value = c(1, 1, 1, 1, 1, 1)), .Names = c("Gross", 
"ReleasedMonth", "variable", "value"), row.names = c(5L, 6L, 
8L, 9L, 10L, 12L), class = "data.frame")

I would like to calculate the mean gross vs. month for each of the 10 genres and plot them in separate bar charts using facets (varying by genre).

In other words, what's a quick way to plot 10 bar charts of mean gross vs. month for each of the 10 genres?

DRozen
  • 387
  • 1
  • 2
  • 9
  • 2
    Well, I guess you would start with data that had a `genre` variable. You need to learn how to pose a reproducible question: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – IRTFM Oct 10 '16 at 00:45
  • 1
    use `dput()` to share the data – Hack-R Oct 10 '16 at 00:56

2 Answers2

1

You should provide a reproducible example to make it easier for us to help you. dput(my.dataframe) is one way to do it, or you can generate an example dataframe like below. Since you haven't given us a reproducible example, I'm going to put on my telepathy hat and assume the "variable" column in your screenshot is the genre.

n = 100
movies <- data.frame(
  genre=sample(letters[1:10], n, replace=T),
  gross=runif(n, min=1, max=1e7),
  month=sample(12, n, replace=T)
)
head(movies)
#   genre     gross month
# 1     e 5545765.4     1
# 2     f 3240897.3     3
# 3     f 1438741.9     5
# 4     h 9101261.0     6
# 5     h  926170.8     7
# 6     f 2750921.9     1

(My genres are 'a', 'b', etc).

To do a plot of average gross per month, you will need to calculate average gross per month. One such way to do so is using the plyr package (there is also data.table, dplyr, ...)

library(plyr)
monthly.avg.gross <- ddply(movies,          # the input dataframe
                           .(genre, month), # group by these
                           summarize, avgGross=mean(gross)) # do this.

The dataframe monthly.avg.gross now has one row per (month, genre) with a column avgGross that has the average gross in that (month, genre).

Now it's a matter of plotting. You have hinted at "facet" so I assume you're using ggplot.

library(ggplot2)
ggplot(monthly.avg.gross, aes(x=month, y=avgGross)) +
       geom_point() +
       facet_wrap(~ genre)

You can do stuff like add month labels and treat month as a factor instead of a number like here, but that's peripheral to your question.

Community
  • 1
  • 1
mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194
0

Thank you very much @mathematical.coffee. I was able to adapt your answer to produce the appropriate bar charts.

meanGrossGenreMonth = ddply(grossGenreMonthLong,
                       .(ReleasedMonth, variable),
                       summarise,
                       mean.Gross = mean(Gross, na.rm = TRUE))

 # plot bar plots with facets
    ggplot(meanGrossGenreMonth, aes(x = factor(ReleasedMonth), y=mean.Gross)) 
+ geom_bar(stat = "identity") + facet_wrap(~ variable) +ylab("mean Gross ($)") 
+ xlab("Month") +ggtitle("Mean gross revenue vs. month released by Genre")
DRozen
  • 387
  • 1
  • 2
  • 9