1

I am trying to create a boxplot with the data frame grades_software, software as a discrete variable X (R/SPSS) and grades as a continuous variable Y.

I used the following code:

library(ggplot2)
ggplot(grades_software, aes(software, grades_software$final_score)) + 
geom_boxplot(fill = fill, colour = line) +
  scale_y_continuous(name = "final_score",
                     breaks = seq(0, 175, 25),
                     limits=c(0, 175)) +
  scale_x_discrete(name = "software") +
  ggtitle("Distribution of Final Course Scores by Software Used")

However, I get the error stated above:

Aesthetics must be either length 1 or the same as the data (100): x, y

I also don't know what's the function of putting breaks = seq and limits in the code.

Neoromanzer
  • 434
  • 2
  • 15
  • 1
    Hello and welcome to StackOverflow. Please read it [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) on how to make a great R reproducible example. – JustCurious Nov 05 '17 at 12:36
  • Try replacing `aes(software, grades_software$final_score))` with `aes(software, final_score))`. Reason because the dataframe is already specified, we only need to specify the column names within `aes`. – mnm Nov 05 '17 at 12:44

1 Answers1

2

You don't need to specify $ for the columns with ggplot.

Try

library(ggplot2)
ggplot(grades_software, aes(software, final_score)) + 
geom_boxplot(fill = fill, colour = line) +
  scale_y_continuous(name = "final_score",
                     breaks = seq(0, 175, 25),
                     limits=c(0, 175)) +
  scale_x_discrete(name = "software") +
  ggtitle("Distribution of Final Course Scores by Software Used")

With breaks you control the gridlines of the graph. Seq creates a sequence of gridlines seq(from, to, by). In you example... set gridlines from 0 to 175 every 25. Limits, om the other hand, is a numeric vector of length two providing limits of the scale. In your case, from 0 to 175.

Neoromanzer
  • 434
  • 2
  • 15