To make the plot you want with ggplot2, it's necessary to prepare the data before plotting. In the solution below, I propose dividing the continuous x-variable into a discrete variable with cut()
, and using aggregate()
to sum the y-values for each bin of x-values. Besides the base R function aggregate
, there are many ways to summarize, aggregate and reshape your data. You may wish to look into the dplyr package or data.table package (two very powerful, well supported packages).
library(ggplot2)
# Use the built-in data set `mtcars` to make the example reproducible.
# Run ?mtcars to see a description of the data set.
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Let's use `disp` (engine displacement) as the x-variable
# and `mpg` (miles per gallon) as the y-variable.
# Bin the `disp` column into discrete variable with `cut()`
disp_bin_edges = seq(from=71, to=472, length.out=21)
mtcars$disp_discrete = cut(mtcars$disp, breaks=disp_bin_edges)
# Use `aggregate()` to sum `mpg` over levels of `disp_discrete`,
# creating a new data.frame.
dat = aggregate(mpg ~ disp_discrete, data=mtcars, FUN=sum)
# Use `geom_bar(stat="identity") to plot pre-computed y-values.
p1 = ggplot(dat, aes(x=disp_discrete, y=mpg)) +
geom_bar(stat="identity") +
scale_x_discrete(drop=FALSE) +
theme(axis.text.x=element_text(angle=90)) +
ylab("Sum of miles per gallon") +
xlab("Displacement, binned")
# For this example data, a scatterplot conveys a clearer story.
p2 = ggplot(mtcars, aes(x=disp, y=mpg)) +
geom_point(size=5, alpha=0.4) +
ylab("Miles per gallon") +
xlab("Displacement")
library(gridExtra)
ggsave("plots.png", arrangeGrob(p1, p2, nrow=1), height=4, width=8, dpi=150)
