3

I am creating a plot with multiple pairs of boxes, I also have a pair mean lines going through the plot, and many outliers shown. The legend currently shows the two boxes, however I would like it to show the two boxes, as well as the two mean lines, and two outliers.

My data looks like

imageNum imageWidth    MinMax    SubAdd      Function
1        100           0.8854797 0.9392620   Height
2        100           0.8810220 0.9367482   Height
3        100           0.9525434 0.9756950   Height
4        100           0.8985018 0.9465377   Height
5        100           0.9174043 0.9569232   Height
6        100           0.8761194 0.9339697   Height

Function contains two categories Height and Area. For each imageNum (1-300) there are 10 levels of resolution (100, 300, 500, ..., 1900). MinMax is the measure I wish to plot. SubAdd is irrelevant at this point.

library(ggplot2)
areas <- read.csv("allResolutionsAreaSorted.csv", header=TRUE)
areas$Function<-"Area"
heights <- read.csv("allResolutionsHeightSorted.csv", header=TRUE)
heights$Function<-"Height"

bound <- rbind(heights, areas)
bound$imageWidth <- as.factor(bound$imageWidth)
bound$Function <- factor(bound$Function, levels=c("Height","Area"))

ggplot(bound, aes(x=imageWidth, y=MinMax, color=Function)) + 
  geom_boxplot(outlier.size = 1, outlier.shape=1) + 
  labs(title="Plot of MinMax Similarity by Image Resolution",
       x="Image Width (Pixels)", y="MinMax Similarity") + 
  theme_light() + scale_color_grey() + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  stat_summary(fun.y=mean, geom="line", lty=3, aes(group=Function),
               position=position_dodge(.75), show.legend=TRUE)

This is the plot I have created so far, I have manually drawn the bottom 4 legend elements as I would like them.

The desired legend.

Edit: My data, and R file: https://drive.google.com/open?id=1JSb0wd3NtoWrs_usEigq88kyc-B5rQ4M

Jason
  • 73
  • 1
  • 7
  • Do you want the two most extreme outliers? – Calum You Apr 08 '19 at 21:10
  • The extreme outliers show useful edge cases in my results. I have considered removing them, however I would rather present the data intact. I am really just interested in providing more information in the legend. – Jason Apr 08 '19 at 22:31
  • I have also plotted with mean points drawn as point within the boxes rather than a line. I am equally happy with either, so long as the legend indicates the mean. – Jason Apr 08 '19 at 22:33
  • Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Apr 09 '19 at 01:04
  • I threw a link to my data and original R studio file above. – Jason Apr 09 '19 at 18:27

0 Answers0