how in R, should I have a histogram with a categorical variable in x-axis and the frequency of a continuous variable on the y axis? is this correct?
-
1No, I don't think it is correct. You probably want a bar plot, not a histogram. Compute aggregate counts by the categorical variable and plot the result. This is probably a duplicate. In the mean time, see [this comment](https://stackoverflow.com/questions/64392990/how-to-create-a-barplot-in-r-with-frequencies-on-the-y-axis-not-the-densities#comment113865091_64392990). – Rui Barradas Dec 24 '21 at 11:22
-
Try asking your question on `https://stats.stackexchange.com`. Cross Validated is a question and answer site for people interested in statistics, machine learning, etc. – psychonomics Jan 02 '22 at 14:46
1 Answers
There are a couple of ways one could interpret "one graph" in the title of the question. That said, using the ggplot2
package, there are at least a couple of ways to render histograms with by groups on a single page of results.
First, we'll create data frame that contains a normally distributed random variable with a mean of 100 and a standard deviation of 20. We also include a group variable that has one of four values, A, B, C, or D.
set.seed(950141237) # for reproducibility of results
df <- data.frame(group = rep(c("A","B","C","D"),200),
y_value = rnorm(800,mean=100,sd = 20))
The resulting data frame has 800 rows of randomly generated values from a normal distribution, assigned into 4 groups of 200 observations.
Next, we will render this in ggplot2::ggplot()
as a histogram, where the color of the bars is based on the value of group
.
ggplot(data = df,aes(x = y_value, fill = group)) + geom_histogram()
...and the resulting chart looks like this:
In this style of histogram the values from each group are stacked atop each other(i.e. the frequency of group A is added to B, etc. before rendering the chart), which might not be what the original poster intended.
We can verify the "stacking" behavior by removing the fill = group
argument from aes()
.
# verify the stacking behavior
ggplot(data = df,aes(x = y_value)) + geom_histogram()
...and the output, which looks just like the first chart, but drawn in a single color.
Another way to render the data is to use group with facet_wrap()
, where each distribution appears in a different facet on one chart.
ggplot(data = df,aes(x = y_value)) + geom_histogram() + facet_wrap(~group)
The resulting chart looks like this:
The facet approach makes it easier to see differences in frequency of y values between the groups.

- 10,505
- 2
- 22
- 33