4

What's the ggplot2 equivalent of "dotplot" histograms? With stacked points instead of bars? Similar to this solution in R:

Plot Histogram with Points Instead of Bars

Is it possible to do this in ggplot2? Ideally with the points shown as stacks and a faint line showing the smoothed line "fit" to these points (which would make a histogram shape.)

zx8754
  • 52,746
  • 12
  • 114
  • 209

4 Answers4

9

ggplot2 does dotplots Link to the manual.

Here is an example:

library(ggplot2)

set.seed(789); x <- data.frame(y = sample(1:20, 100, replace = TRUE))

ggplot(x, aes(y)) + geom_dotplot()

In order to make it behave like a simple dotplot, we should do this:

ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot')    

You should get this:

first plot

To address the density issue, you'll have to add another term, ylim(), so that your plot call will have the form ggplot() + geom_dotplot() + ylim()

More specifically, you'll write ylim(0, A), where A will be the number of stacked dots necessary to count 1.00 density. In the example above, the best you can do is see that 7.5 dots reach the 0.50 density mark. From there, you can infer that 15 dots will reach 1.00.

So your new call looks like this:

ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot') + ylim(0, 15)

Which will give you this:

second plot

Usually, this kind of eyeball estimate will work for dotplots, but of course you can try other values to fine-tune your scale.

Notice how changing the ylim values doesn't affect how the data is displayed, it just changes the labels in the y-axis.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107
  • 2
    can you dynamically program the "A" so that you don't need to set it by visual inspection? – user3022875 Sep 01 '15 at 18:53
  • If you need binwidth to be something besides 1 (e.g. `binwidth=z`), you can still control the scaling by setting the aspect ratio to match: `coord_fixed(z)` . – Dave Nov 03 '16 at 22:14
7

As @joran pointed out, we can use geom_dotplot

require(ggplot2)
ggplot(mtcars, aes(x = mpg)) + geom_dotplot()

enter image description here


Edit: (moved useful comments into the post):

The label "count" it's misleading because this is actually a density estimate may be you could suggest we changed this label to "density" by default. The ggplot implementation of dotplot follow the original one of Leland Wilkinson, so if you want to understand clearly how it works take a look at this paper.

An easy transformation to make the y axis actually be counts, i.e. "number of observations". From the help page it is written that:

When binning along the x axis and stacking along the y axis, the numbers on y axis are not meaningful, due to technical limitations of ggplot2. You can hide the y axis, as in one of the examples, or manually scale it to match the number of dots.

So you can use this code to hide y axis:

ggplot(mtcars, aes(x = mpg)) + 
  geom_dotplot(binwidth = 1.5) + 
  scale_y_continuous(name = "", breaks = NULL)

enter image description here

zx8754
  • 52,746
  • 12
  • 114
  • 209
dickoa
  • 18,217
  • 3
  • 36
  • 50
  • 1
    Could you explain the scaling? the x-axis is binned, but is the y-axis representing actual data points (which the label "count" would suggest)? if so, why is it from 0 to 1? it's very counterintuitive –  Apr 25 '13 at 15:13
  • 4
    You are right about the label "count" it's misleading because this is actually a density estimate may be you could suggest we changed this label to "density" by default. The ggplot implementation of dotplot follow the original one of Leland Wilkinson, so if you want to understand clearly how it works take a look at this paper http://www.cs.uic.edu/~wilkinson/Publications/dots.pdf – dickoa Apr 25 '13 at 15:39
  • 2
    Is there an easy transformation to make the y axis actually be counts, i.e. "number of observations"? –  Apr 25 '13 at 20:18
  • 2
    From the help page it is written that `When binning along the x axis and stacking along the y axis, the numbers on y axis are not meaningful, due to technical limitations of ggplot2. You can hide the y axis, as in one of the examples, or manually scale it to match the number of dots.` So you can use this code to hide y axis `ggplot(mtcars, aes(x = mpg)) + geom_dotplot(binwidth = 1.5) + + scale_y_continuous(name = "", breaks = NULL)` – dickoa Apr 25 '13 at 21:31
  • I don't follow - I see that you can hide the y-axis... but that just gets rid of it. I just want the y-axis to be # of dots instead. –  Apr 26 '13 at 00:04
  • 1
    The actual implement have a meaningless y_axis (which is not a big deal for a dotplot) so without creating your own function (tweaking `geom_dotplot`) I don't see how to achieve what you want. I really want to help but don't have much time to do this now. Try the ggplot mailing list there are a lot of ggplot2 expert over there. I can remove my answer if you want – dickoa Apr 26 '13 at 04:51
  • thanks appreciate your help... don't think you should remove your answer as it is very helpful –  Apr 26 '13 at 05:12
3

I introduce an exact approach using @Waldir Leoncio's latter method.

library(ggplot2); library(grid)

set.seed(789)
x <- data.frame(y = sample(1:20, 100, replace = TRUE))

g <- ggplot(x, aes(y)) + geom_dotplot(binwidth=0.8)
g  # output to read parameter

### calculation of width and height of panel
grid.ls(view=TRUE, grob=FALSE)
real_width <- convertWidth(unit(1,'npc'), 'inch', TRUE)
real_height <- convertHeight(unit(1,'npc'), 'inch', TRUE)

### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
real_binwidth <- real_width / width_coordinate_range * 0.8  # 0.8 is the argument binwidth
num_balls <- real_height / 1.1 / real_binwidth  # the number of stacked balls. 1.1 is expanding value.
   # num_balls is the value of A

g + ylim(0, num_balls)

enter image description here

cuttlefish44
  • 6,586
  • 2
  • 17
  • 34
1

Apologies : I don't have enough reputation to 'comment'.

I like cuttlefish44's "exact approach", but to make it work (with ggplot2 [2.2.1]) I had to change the following line from :

### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)

to

### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$layout$panel_ranges[[1]]$x.range)
camelCase
  • 81
  • 3