0

I am not sure how to approach this. I want to create a "dotpot" style plot in R from a data frame of categorical variables (factors) such that for each column of the df I plot a column of dots, each coloured according to the factors. For example,

my_df <- cbind(c('sheep','sheep','cow','cow','horse'),c('sheep','sheep','sheep','sheep',<NA>),c('sheep','cow','cow','cow','cow'))

I then want to end up with a 3 x 5 grid of dots, each coloured according to sheep/cow/horse (well, one missing because of the NA).

Unstack
  • 551
  • 3
  • 7
  • 13

1 Answers1

0

Do you mean something like this:

my_df <- cbind(c('sheep','sheep','cow','cow','horse'),
               c('sheep','sheep','sheep','sheep',NA),
               c('sheep','cow','cow','cow','cow'))

df <- data.frame(my_df) # make it as data.frame
df$id <- row.names(df)  # add an id

library(reshape2)
melt_df <-melt(df,'id') # melt it

library(ggplot2)  # now the plot
  p <- ggplot(melt_df, aes(x = variable, fill = value))
  p + geom_dotplot(stackgroups = TRUE, binwidth = 0.3, binpositions = "all")

enter image description here

s__
  • 9,270
  • 3
  • 27
  • 45
  • Yes, this is pretty much ideal. But instead of 'count' on the y-axis, from 0 to 1, I wanted a real count, perhaps with ticks every 5 or so. But I can try to sort that out later. Thanks. – Unstack Sep 05 '18 at 21:52
  • 1
    'Melting' is not an obvious/user-friendly step (to me anyway!). Such a strange concept. – Unstack Sep 05 '18 at 21:54
  • 1
    Ggplot2 preferes the data in the long format, this means that it is better to have variables in column rather than in rows. If you explore the melted data you can easily understand it facing it to the original data, or you can find several questions or blogs about long and wide format in R. For the labels I think it is possible to have real values and/or remove the axis. – s__ Sep 05 '18 at 22:10
  • 1
    You can add the numbers on the y-axis reading [this](https://stackoverflow.com/questions/38900487/showing-count-on-x-axis-for-dot-plot) but it's an workaround because `geom_dotplot()` seems to have an [issue](https://github.com/tidyverse/ggplot2/issues/2203), so you have to trick with the y-axis labels (seems ok `+ ylim(0,max(table(melt_df$value))+5)`). Or you can hide the y-axis with `theme`. – s__ Sep 06 '18 at 06:36