In our datasets, we have a few absolutely huge outliers. If we plot (eg in a boxplot) and include the outliers, the axis will be so squeezed that it's useless. Log-scaling doesn't help. But we want to tell the reader that the outliers exist (and say how many, and on which side of the boxplot, positive or negative), preferably without adding text manually to the caption. Is there a good method for this? Preferably in R, Matplotlib or Seaborn.
This is different from eg Ignore outliers in ggplot2 boxplot because I don't want to ignore the outliers: I want to show that they exist, but not plot them.
Sample code:
# from https://stackoverflow.com/questions/5677885/ignore-outliers-in-ggplot2-boxplot
> library("ggplot")
> df = data.frame(y = c(-100, rnorm(100), 100))
> ggplot(df, aes(y = y)) + geom_boxplot(aes(x = factor(1)))
We see a boxplot that is useless because of the presence of outliers. If we follow the accepted answer at that link, we remove the outliers in a very nice way, but now the reader doesn't realise there were any outliers.
EDIT a couple of comments/answers ask what I actually want, but that is precisely the difficulty -- I know I want an automated graphical presentation of the outliers (together with the main data), but I don't know what this should look like, exactly. I hope someone in the community knows some best practice for this situation. I don't need help writing code to find outliers or add text to plots.