R Sort Cleavland Dot Plot by not shown variable

Question

I followed this manual (https://afit-r.github.io/cleveland-dot-plots) to create a Cleaveland Dot Plot which I was able to reproduce but I faced the following challenges:

How do I sort my Y-Axis in historical order? The varieties on my y-axis have different release years and although those are not shown in my plot I would like to order them in historical order. Now they are in some wired alphabetic order starting from the back and I don't even know how to change that.
I couldn't manage to show the differences between the plots in percentages (like in the manual), could anyone explain to me that in more detail?
Do you see any possibility of including the same data for another year?

See below for my code and picture:

require(ggplot2)
require(reshape2)
require(dplyr)
require(plotrix)
cleanup = theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(color = "black"))

data19 = read.csv("Harvest_2019_V2.csv", sep = ";")
data19$Experiment_Year <- as.factor(data19$Experiment_Year)
data19$Release_year <- as.factor(data19$Release_year)
Subset2019 = subset(data19, Experiment_Year == 2019)

agHarvest.Weight <- aggregate(Subset2019[, 9], list(Subset2019$Variety,Subset2019$Release_year,Subset2019$Treatment), mean)
agHarvest.Weight$Variety <- agHarvest.Weight$Group.1
agHarvest.Weight$Release_Year <- agHarvest.Weight$Group.2
agHarvest.Weight$Treatment <- agHarvest.Weight$Group.3
agHarvest.Weight$Yield <- agHarvest.Weight$x

right_label <- agHarvest.Weight %>%
  group_by(Variety) %>%
  arrange(desc(Yield)) %>%
  top_n(1)

left_label <- agHarvest.Weight %>%
  group_by(Variety) %>%
  arrange(desc(Yield)) %>%
  slice(2)

ggplot(agHarvest.Weight, aes(Yield, Variety)) +
  geom_line(aes(group = Variety)) +
  geom_point(aes(color = Treatment), size = 1.5) + 
  geom_text(data = right_label, aes(color = Treatment, label = round(Yield, 0)),
            size = 3, hjust = -.5) +
  geom_text(data = left_label, aes(color = Treatment, label = round(Yield, 0)),
            size = 3, hjust = 1.5) +
  scale_x_continuous(limits = c(2500, 4500)) + cleanup + xlab("Yield, g") +
  scale_color_manual(values=c("blue","darkgreen"))

R Plot Picture

I am sorry but I can‘t as I don‘t have permission to share the data. Is there any other way you could help me? — iMate, Oct 03 '20 at 14:22

score 1 · Answer 1 · answered Oct 03 '20 at 23:35

OP. Understandably, you cannot always share data for various reasons. This is why it is always recommended to either use an existing publicly-available dataset or craft your own in order to produce a minimum reproducible example. Fortunately, you're in luck, as I don't mind doing this for you. :)

TL;DR - there are many ways, but simplest method is to use reorder(your_variable, variable_to_sort_by). Note that y axis direction goes "bottom-up" rather than "top-to-bottom" on the plot.

Example Data

df <- data.frame(
  Variety=rep(LETTERS[1:5], each=2),
  Yield=c(265, 285, 458, 964, 152, 202, 428, 499, 800, 900),
  Treatment=rep(c('first','second'), 5),
  Year=rep(c(2000, 2001, 2010, 1999, 1998), each=2)
)

> df
   Variety Yield Treatment Year
1        A   265     first 2000
2        A   285    second 2000
3        B   458     first 2001
4        B   964    second 2001
5        C   152     first 2010
6        C   202    second 2010
7        D   428     first 1999
8        D   499    second 1999
9        E   800     first 1998
10       E   900    second 1998

Basic Cleveland Dot Plot

p <- ggplot(df, aes(x=Yield, y=Variety)) +
  geom_line(aes(group=Variety)) +
  geom_point(size=3) +
  geom_text(aes(label=Yield), nudge_y=0.2, size=2) +
  theme_bw()
p

Sort Variety (Y axis) by Year Column

You should first notice how ggplot2 arranges your axes. The key is to understand that the origin of the plot starts at the bottom left corner. This means that the lowest value for x and y axes will be at the left and bottom, respectively. This is the reason why df$Variety is alphabetical, but "goes up" (from bottom to top). To reverse the y axis, you can just add scale_y_reverse() to your plot code, but that only works for continuous axes. For discrete axes, you can use scale_y_discrete(limits=rev(df$Variety)). You'll see in the following approach we can avoid that.

To sort the y axis by another column, you can use reorder() right with the aes() call. The reorder() function is basically setup as follows:

reorder(columnA, column_to_use_to_sort_columnA)

In this case, you'll want to sort df$Variety by df$Year, so this should become:

reorder(Variety, Year)

...but remember how the y axis "goes up"? If you want the Y axis to be sorted by df$Year and "go down", you can either reverse the axis via scale_y_discrete(limits=rev(df$Variety)), or conveniently just sort by df$Year in reverse using the syntax:

reorder(Variety, -Year)

Putting this together you get this:

p1 <- ggplot(df, aes(x=Yield, y=reorder(Variety, -Year))) +
  geom_line(aes(group=Variety)) +
  geom_point(size=2) +
  geom_text(aes(label=Yield), nudge_y=0.2, size=2) +
  theme_bw()
p1

You'll see we have our proper order now, where df$Variety is sorted by ascending df$Year, starting from the top (1999) and going down to the bottom (2010).

Other ways?

There's other ways to do your sorting, but I found this most straightforward. The other fundamentally different approach would be to sort your data frame first, then plot. However, if you do this, be aware that ggplot2 will convert any column with discrete values into a factor first, and the default factor levels are created by sorting the names in alphabetical order. This means that if you sort your data frame first, then plot, you'll still be stuck with alphabetical order. You would need to sort, then discretely convert df$Variety into a factor (and specify the levels), then plot. Something like this works just the same:

df <- dplyr::arrange(df, -Year)  # arrange by descending Year
df$Variety <- factor(df$Variety, levels=unique(df$Variety))  # factor and indicate levels

ggplot(df, aes(x=Yield, y=Variety)) +
  geom_line(aes(group=Variety)) +
  geom_point(size=2) +
  geom_text(aes(label=Yield), nudge_y=0.2, size=2) +
  theme_bw() +
  scale_y_discrete(limits=rev(df$Variety))

Above code gives you the same plot as the method using reorder(Variety, -Year).

see my other answer as the comment section was not enough to answer you. — iMate, Oct 05 '20 at 09:13
I see... it seems that without a representative portion of your data it is going to be very difficult to help you much further. This is something where the solution will come out of understanding how the dataset is organized. As it stands, we don't have a minimal reproducible example. — chemdork123, Oct 05 '20 at 13:52

R Sort Cleavland Dot Plot by not shown variable

1 Answers1

Example Data

Basic Cleveland Dot Plot

Sort Variety (Y axis) by Year Column

Other ways?