-1

I'm a complete R noob and am a bit stumped on one of my homework problems. Below is the type of histogram I am trying to create using ggplot2:

!(https://i.stack.imgur.com/8CtcZ.jpg)

I have a dataset which specifies the release period, average rating, and the rating year of a list of movies.

My dataset includes a column called rating.year, where every sample of data is categorized into either "2004" or "2005", and another column called "Release.period", which is used to label the X axis. The Y axis is the mean of all ratings of movies released in 2004, and 2005 . I need to create a histogram that looks identical to the one shown, where the red bar represents the average rating of all movies rated in 2004, and in blue the average of all movies rated in 2005.

So my question is: Using ggplot2, how do I calculate the mean of the ratings for the respective years and plot it onto a histogram, and how do I create two separate bars as shown in the model histogram?

  • 1
    Welcome to SO! Please make this question *reproducible*. This includes sample code (including listing non-base R packages), sample data (e.g., `dput(head(x))`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Apr 12 '19 at 05:38

1 Answers1

1

You can use the dplyrpackage to summarise() your data:

library(ggplot2)
library(dplyr)

# create data
factors <- expand.grid(c(2004, 2005), c('1940-1960', '1960-1980', '1980-2000', '2000-2010'))
set.seed(42)
ratings <- runif(50, 2.5, 3.2)
data <- c()
for (i in 1:length(ratings)) {
  fact <- sample(1:nrow(factors), 1)
  data <- rbind(data, cbind(factors[fact, ], ratings[i]))
}
names(data) <- c('rating.year', 'Release.period', 'rating')
data$rating.year <- factor(data$rating.year)

# calculate the mean of ratings
data.sum <- data %>% group_by(rating.year, Release.period) %>% summarise(rating=mean(rating))

# plot the data
gg <- ggplot(aes(x=Release.period, y=rating, fill=rating.year), data=data.sum) + ylab('Mean of the Average Ratings')
# in geom_bar()
# stat='identity' will make the bars the height of your y-varible, i.e. rating
# position = 'dodge' will place bars with different fill next to each other
gg <- gg + geom_bar(stat='identity', width=0.5, position = 'dodge')
print(gg)

enter image description here

Simon
  • 577
  • 3
  • 9