1

I have "long" format data frame which contains two columns: first col - values, second col- sex [Male - 1/Female - 2]. I wrote some code to make a histogram of entire dataset (code below).

ggplot(kz6, aes(x = values)) + 
  geom_histogram()

However, I want also add a density over histogram to emphasize the difference between sexes i.e. I want to combine 3 plots: histogram for entire dataset, and 2 density plots for each sex. I tried to use some examples (one, two, three, four), but it still does not work. Code for density only works, while the combinations of hist + density does not.

density <- ggplot(kz6, aes(x = x, fill = factor(sex))) + 
  geom_density()

both <- ggplot(kz6, aes(x = values)) + 
  geom_histogram() +
  geom_density()

both_2 <- ggplot(kz6, aes(x = values)) + 
  geom_histogram() +
  geom_density(aes(x = kz6[kz6$sex == 1,]))

P.S. some examples contains y=..density.. what does it mean? How to interpret this?

jeparoff
  • 166
  • 8
  • 1
    Does one of the answers posted [here](https://stackoverflow.com/questions/5688082/overlay-histogram-with-density-curve) suit your needs? – Punintended Feb 28 '20 at 18:39

1 Answers1

2

To plot a histogram and superimpose two densities, defined by a categorical variable, use appropriate aesthetics in the call to geom_density, like group or colour.

ggplot(kz6, aes(x = values)) +
  geom_histogram(aes(y = ..density..), bins = 20) +
  geom_density(aes(group = sex, colour = sex), adjust = 2)

enter image description here

Data creation code.

I will create a test data set from built-in data set iris.

kz6 <- iris[iris$Species != "virginica", 4:5]
kz6$sex <- "M"
kz6$sex[kz6$Species == "versicolor"] <- "F"
kz6$Species <- NULL
names(kz6)[1] <- "values"
head(kz6)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thank you, it works, but could you explain me how ```y = ..density..``` and what does it mean? – jeparoff Feb 28 '20 at 21:51
  • @HermanCherniaiev In `help("geom_histogram")` section `Computed variables` there are several variables. Put any of them between two dots to gain access to their values. `..density..` is the one you want. – Rui Barradas Feb 28 '20 at 23:22