0

I'm trying to overlay density plots for an outcome variable that is expressed as an integer scale (1-7). Right now I'm using:

ggplot(dface, aes(Current.Mood, fill = NewCode))+ geom_density(alpha = 0.1)

That gets me:

enter image description here

For some reason I don't understand, ggplot is putting valleys in between the integer values (pictured below) Does anyone know how I can get the plot to smooth these over?

Does anyone know how I can smooth these out? They are making the plot very hard to interpret and don't really reflect what's happening in my data.

  • 3
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It appears your data is discrete. Density plots are meant for continuous data. You can try mucking about with the `adjust=` parameter to change the default bandwidth option. A bar chart is probably more appropriate. – MrFlick Aug 13 '21 at 16:48

2 Answers2

2

geom_density(bw=..) is useful here.

      bw: The smoothing bandwidth to be used. If numeric, the standard
          deviation of the smoothing kernel. If character, a rule to
          choose the bandwidth, as listed in 'stats::bw.nrd()'.
ggplot(mtcars, aes(cyl)) + geom_density(bw = 0.1) + labs(title = "bw = 0.1")
ggplot(mtcars, aes(cyl)) + geom_density() + labs(title = "bw default")
ggplot(mtcars, aes(cyl)) + geom_density(bw = 2) + labs(title = "bw = 2")

bandwidth of 0.1

default bandwidth

bandwidth of 2

Or, as MrFlick suggested, you can use adjust=:

  adjust: A multiplicate bandwidth adjustment. This makes it possible
          to adjust the bandwidth while still using the a bandwidth
          estimator. For example, 'adjust = 1/2' means use half of the
          default bandwidth.
ggplot(mtcars, aes(cyl)) + geom_density(adjust = 0.5) + labs(title = "adjust = 0.5")
ggplot(mtcars, aes(cyl)) + geom_density(adjust = 0.9) + labs(title = "adjust = 0.9")

adjust of 0.5

adjust of 0.9

r2evans
  • 141,215
  • 6
  • 77
  • 149
0

Your choice of data visualization is not ideal. You want to compare the outcome variables across the 1-7 scale of different questions/groups. You probably want to map the frequency of the outcome variable to a geom_line or geom_area or both.

Using survey data from Kaggle.

library(tidyverse)

my_data <- read_csv("~/Downloads/archive/test.csv")

plot_data <- my_data %>%
  select(id, `Inflight wifi service`:`Food and drink`) %>%
  pivot_longer(`Inflight wifi service`:`Food and drink`, names_to = "question", values_to = "response") %>%
  count(question, response) %>%
  group_by(question) %>%
  mutate(freq = n / sum(n))

ggplot(plot_data) +
  geom_area(aes(x = response, fill = question, y = freq), alpha = 0.5)

enter image description here

Jeff Parker
  • 1,809
  • 1
  • 18
  • 28