1

When I try to plot the density of some numerical data either using geom_density() or stat_density(), I get a non-smooth curve. Using adjust do not change this. enter image description here

Here I've used facet_zoom(), but also coord_cartesian(xlim = c(...)) produces this non-smooth curve. Pretty weird in my opinion. Any suggestions what's going on?

https://drive.google.com/file/d/1PjQp7XkY5G21NoIo8y8lyeaXKvuvrqVk/view?usp=sharing

Edit: I have uploaded 50000 rows of the original data. To reproduce the plot (not using ggforce), use the code:

data <- read.table("rep.txt")

( 
  ggplot(data, aes(x = x))
  + geom_density(adjust = 1, fill = "grey")
  + coord_cartesian(xlim = c(-50000,50000))
  + labs(x = "", y = "")
  + theme_bw()
)

markus
  • 25,843
  • 5
  • 39
  • 58
mas2
  • 75
  • 11

1 Answers1

1

I reproduced your code but was unable to reproduce the exact image in your original question. Are you concerned about the lack of smoothness at the very tip of the geom_density plot? There are other arguments you can try like kernel and bw, but the sheer number of zeroes in your data will make it hard to achieve a smooth curve (unless you ramp up your adjust value).

library(tidyverse)
options(scipen = 999999)

# https://stackoverflow.com/questions/33135060/read-csv-file-hosted-on-google-drive
id <- "1PjQp7XkY5G21NoIo8y8lyeaXKvuvrqVk" # google file ID
data <- read.table(sprintf("https://docs.google.com/uc?id=%s&export=download", id)) %>%
  rownames_to_column(var = "var")

ggplot(data, aes(x = x)) + 
  geom_density(
    adjust = 10, 
    fill = "grey", 
    kernel = "cosine",
    bw = "nrd0") + 
  coord_cartesian(xlim = c(-50000,50000)) + 
  labs(x = "", y = "") + theme_bw()

enter image description here

# I didn't export images for these, but they showcase how many zeroes you have
ggplot(data, aes(x = x)) + 
  geom_histogram(bins = 1000) +
  coord_cartesian(xlim = c(0,50000)) + 
  labs(x = "", y = "") + theme_bw()

ggplot(data, aes(x = x)) + 
  geom_freqpoly(bins = 1000) +
  coord_cartesian(xlim = c(0,50000)) + 
  labs(x = "", y = "") + theme_bw()
jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
  • Yeah, I have made a ton of exploratory analysis on raw data and have never encountered that the density plot being non-smooth. – mas2 Oct 23 '22 at 15:38
  • It seems like it's just the nature of this data: you have a lot of zeroes (N = 31956), relatively few negative values, and non-zero values infrequently repeat (look at `table(data$x)`). – jrcalabrese Oct 23 '22 at 15:53