How to add sample size used in plotting geom_jitter

Question

I want to add how many samples were added to a graph, next to my stat_cor (ggpubr) text.

I'm using the following code to generate the graph:

dataset = mtcars

 ggplot(dataset,  aes(dataset$wt, dataset$disp)) +
 geom_jitter()  +
 geom_smooth(level=0.95, method = "loess") +
 stat_cor(method="spearman") + 
 theme_classic()

But, if I want to plot multiple graphs in one figure, which uses a real data set where different variables have different missing values, it would be nice to have my sample size used to plot the geom_jitter.

As an aside: [don't use `$` in `aes()`](https://stackoverflow.com/a/32543753/8583393); use `ggplot(mtcars, aes(wt, disp))` — markus, Jan 23 '20 at 21:08
As I understand, this issue has been solved (https://github.com/tidyverse/ggplot2/pull/2694) — Gabriel G., Jan 23 '20 at 21:12
Using `$ in aes()` is still bad practice... there's no advantage to doing it`$`. It doesn't break facets anymore, but still may have other unintended consequences (in the past it's certainly caused issues when using binning functions as well, e.g., `stat_bin_2d`) — Gregor Thomas, Jan 23 '20 at 21:17
I found out that using $ in aes is sometimes needed when you have special characters in you colname, although I'm not experienced enough with R to give a description as to why this happens (I never tried to find why, just found a workaround). I agree that I should be aware of doing this, thanks for point out. — Gabriel G., Jan 23 '20 at 21:21
If you're using special characters in your column names you need backticks, but you shouldn't need `$`. — Gregor Thomas, Jan 23 '20 at 21:22

score 4 · Answer 1 · edited Feb 05 '20 at 18:59

It's a little hacky (and limited in its options), but you can use the label.sep argument to insert the sample size between the correlation coefficient and the p-value (note that somewhat older version of ggpubr have a bug with label.sep... if this doesn't work for you, try updating your package)

ggplot(mtcars,  aes(wt, disp)) +
  geom_jitter()  +
  geom_smooth(level = 0.95, method = "loess") +
  stat_cor(method = "spearman", label.sep = sprintf(", n = %s, ", nrow(mtcars))) +
  theme_classic()

If your concern is missing values, you might need to use a different function than nrow, but I'll leave that to you. This also will not work with facets (you'll get the same number in each facet).

For a fully flexible solution, I think you could use a geom_text, or maybe a stat_summary with geom = "text" would be possible?

Or go hardcore like this answer, if nothing else works

Just for completeness on missing values:

ggplot(mtcars,  aes(wt, disp)) +
geom_jitter()  +
geom_smooth(level = 0.95, method = "loess") +
stat_cor(method = "spearman", label.sep = 
  sprintf(", n = %s, ", 
    sum(complete.cases(mtcars[c("wt","disp")]))
  )) +
theme_classic()

To plot the value of N on complete cases of wt and disp as the example shows

I added some code to assess N on multiple complete.cases on any collumns — Gabriel G., Feb 04 '20 at 20:46

How to add sample size used in plotting geom_jitter

1 Answers1