0

how in R, should I have a histogram with a categorical variable in x-axis and the frequency of a continuous variable on the y axis? is this correct?

Samin Ba
  • 13
  • 5
  • 1
    No, I don't think it is correct. You probably want a bar plot, not a histogram. Compute aggregate counts by the categorical variable and plot the result. This is probably a duplicate. In the mean time, see [this comment](https://stackoverflow.com/questions/64392990/how-to-create-a-barplot-in-r-with-frequencies-on-the-y-axis-not-the-densities#comment113865091_64392990). – Rui Barradas Dec 24 '21 at 11:22
  • Try asking your question on `https://stats.stackexchange.com`. Cross Validated is a question and answer site for people interested in statistics, machine learning, etc. – psychonomics Jan 02 '22 at 14:46

1 Answers1

0

There are a couple of ways one could interpret "one graph" in the title of the question. That said, using the ggplot2 package, there are at least a couple of ways to render histograms with by groups on a single page of results.

First, we'll create data frame that contains a normally distributed random variable with a mean of 100 and a standard deviation of 20. We also include a group variable that has one of four values, A, B, C, or D.

set.seed(950141237) # for reproducibility of results 
df <- data.frame(group = rep(c("A","B","C","D"),200),
                 y_value = rnorm(800,mean=100,sd = 20))

The resulting data frame has 800 rows of randomly generated values from a normal distribution, assigned into 4 groups of 200 observations.

Next, we will render this in ggplot2::ggplot() as a histogram, where the color of the bars is based on the value of group.

ggplot(data = df,aes(x = y_value, fill = group)) + geom_histogram()

...and the resulting chart looks like this:

enter image description here

In this style of histogram the values from each group are stacked atop each other(i.e. the frequency of group A is added to B, etc. before rendering the chart), which might not be what the original poster intended.

We can verify the "stacking" behavior by removing the fill = group argument from aes().

# verify the stacking behavior
ggplot(data = df,aes(x = y_value)) + geom_histogram()

...and the output, which looks just like the first chart, but drawn in a single color.

enter image description here

Another way to render the data is to use group with facet_wrap(), where each distribution appears in a different facet on one chart.

ggplot(data = df,aes(x = y_value)) + geom_histogram() + facet_wrap(~group)

The resulting chart looks like this:

enter image description here

The facet approach makes it easier to see differences in frequency of y values between the groups.

Len Greski
  • 10,505
  • 2
  • 22
  • 33