Problem:
I have a data frame with 2 variables (x, y). The y variable is "typically" varying in a "small range". There are few outliers in the data frame. Here's an example:
# uniform sample data frame
# y variable "typically" varying in a "small" range between 0 and 1
df = data.frame(
x = 1:100,
y = runif(100)
)
# add 2 outlier to data frame
# yielding a data frame
# with 99 normal values and 1 outlier
df[3, 2] = 50
df[4, 2] = -50
So the data frame has 98 typically values and 2 outliers in the y-variable, as you can see from the first 10 rows head(df, 10)
:
x y
1 1 0.9785541
2 2 0.2321611
3 3 50.0000000
4 4 -50.0000000
5 5 0.8316717
6 6 0.1135077
7 7 0.9633120
8 8 0.1473229
9 9 0.1436269
10 10 0.9252299
When plotting the data frame as bar plot (y~x), ggplot2 is automatically (& correctly) scaling the y-axis to the full range of observed y-values:
require("ggplot2")
ggplot(df, aes(x, y)) + geom_bar(stat="identity")
In order to focus on "typical" values, I'd like ggplot2 to keep y-axis scale on "small" scale plot the outliers off axis limits.
Here's my first attempt:
lower.cut = quantile(df$y, 0.02)
# = 0.01096518
upper.cut = quantile(df$y, 0.98)
# = 0.9872347
ggplot(df, aes(x, y)) + geom_bar(stat="identity") +
coord_cartesian( ylim = c(-lower.cut*1.1, upper.cut*1.1) )
Question:
The first attempt has the disadvantage that the 0.02 and 0.98 quantile setting are kind of arbitrary.
Is there a smarter (less arbitrary, more statistically proved) way to have ggplot2 automatically limit it's axis to typical values while allowing outliers to be plotted off axis limits ?
Answers I looked into:
- Ignore outliers in ggplot2 boxplot: is focussed on ggplot2's
geom_boxplot
rather thangeom_bar
. - ggplot2 barplot dealing with 1 outlier pushing the axis up [duplicate]: the answers, are hinting to use facts to exclude the outlier, which I do not want to. The question is also marked as duplicate, but the link to the similar question "What are alternatives to broken axes?" provides only answers regarding the general challenge how to deal with axis stretching through outliers, but no specific solution to my specific question.