-1

I have a simple dataframe containing three columns:

ST_CODE    |    VALUE    |    HEIGHT
...             ...           ...
factor          continuous    continuous

I want a VALUE boxplot for each ST_CODE, but I want the order on the x axis to be determined by the ascending order of HEIGHT. This is the code:

ggplot(ozone, aes(x = ST_CODE, y = VALUE)) +
    geom_boxplot(notch=TRUE)

Ordering ozone inside the ggplot function by doing ozone[order(ozone$HEIGHT),] was useless, because the order is determined by ST_CODE. What should I do?

Here's the dataset: https://www.dropbox.com/s/kf0jcv50oaa5my9/ozone_example.csv?dl=0

I have found this question, but I didn't really get it: Rearrange x axis according to a variable in ggplot

Pigna
  • 2,792
  • 5
  • 29
  • 51
  • 2
    In which package can I find the `ozone` data? Or is your data example non-public. Then add a minimal reproducible example please containing a small data set – R Yoda Feb 11 '18 at 18:18
  • Possible duplicate of [How to change order of boxplots when using ggplot2?](https://stackoverflow.com/questions/6867393/how-to-change-order-of-boxplots-when-using-ggplot2) – dww Feb 11 '18 at 18:44
  • 1
    I have added the dataset – Pigna Feb 11 '18 at 18:47
  • Thx for the dataset but please don't add dropbox links but add a **minimal** example inside your question as R code and show the expected result (e. g. by writing the expected order of ST_CODE in the plot). – R Yoda Feb 11 '18 at 18:51
  • I cannot show the expected result: the order of the boxplots depends on the height of the station (which won't appear in the graph). If station x has height 20 and station y has height 12, then station y boxplot should appear more to the left in comparison to the station x boxplot – Pigna Feb 11 '18 at 18:57
  • Are you sure the "reader" of the boxplot result can interpret the boxplots intuitively if the ordering logic is not obvious? I mean the HEIGHT is not visible in the plot... – R Yoda Feb 11 '18 at 19:00
  • I don't know. That's why I asked ... – Pigna Feb 12 '18 at 09:47
  • @Pigna So does my proposed solution work as you wanted? I have modified the answer to use your data file... – R Yoda Feb 12 '18 at 11:31
  • Thanks, @RYoda it works! I couldn't check stackoverflow all day long – Pigna Feb 12 '18 at 20:37

1 Answers1

0

The solution should be to order the levels of the factor variable ST_CODE according to the VALUE column.

Until you provide example data this is my best guess :-)

Edit 1: I have added read.csv to read your data and I would say it works. To make it easier to check the result I have used only the first 1000 rows which contain only three different ST_CODEs).

library(ggplot2)

# example data
# data <- data.frame( ST_CODE = rep(c("A", "B", "C"), 2), VALUE = rep(3:1, 2), HEIGHT = rep(c(2, 1, 3), 2))
# data

# Your data
data <- read.csv("ozone_example.csv")
data <- data[1:1000,]
table(data$ST_CODE, data$HEIGHT) # indicates how to order ST_CODEs


# plot (not sorted by HEIGHT)
ggplot(data, aes(x = ST_CODE, y = VALUE)) +
  geom_boxplot(notch=TRUE)

# Plot sorted by HEIGHT by changing the factor level order
ordered.data <- data[order(data$HEIGHT),]
data$ST_CODE <- factor(data$ST_CODE, levels = unique(ordered.data$ST_CODE))
ggplot(data, aes(x = ST_CODE, y = VALUE)) +
  geom_boxplot(notch=TRUE)
R Yoda
  • 8,358
  • 2
  • 50
  • 87
  • 2
    To whoever just had the courtesy of downvoting this without leaving a comment: you're doing it wrong. Only with comments can we have a discussion where everybody can learn. – R Yoda Feb 11 '18 at 18:51