0

I'm running into several problems producing a graph that labels outliers. This is a sample of my data:

dput(x)
    structure(list(Sample = c("EM16b", "TK01a", "TK34a", "WB25", 
    "BK29", "EM09b", "TK02b", "TK29a", "TK20a", "PT57", "PT59", "EM18a", 
    "PT01", "EM05b", "BK16", "EM07b", "WB17", "BK01", "PT22", "WB07"
    ), Tribe = c("ElMolo", "Turka", "Turka", "Webuye", "Baka", "ElMolo", 
    "Turka", "Turka", "Turka", "PT_Luo", "PT_Luo", "ElMolo", "PT_Luo", 
    "ElMolo", "Baka", "ElMolo", "Webuye", "Baka", "PT_Luhya", "Webuye"
    ), Breath.d13C = c(-22.63, -18.91, -19.23, -17.61, -23.07, -21.26, 
    -16.86, -14.54, -22.37, NA, -23.22, -17.54, -19.76, -20.18, -24.52, 
    -19.17, -16.09, -25.62, -19.92, -15.04)), .Names = c("Sample", 
    "Tribe", "Breath.d13C"), row.names = c(32L, 65L, 131L, 166L, 
    270L, 18L, 68L, 121L, 103L, 231L, 233L, 35L, 175L, 10L, 257L, 
    14L, 156L, 242L, 196L, 145L), class = "data.frame")

So, first R imported all columns as factors (which never did before). To transform it back to character and numeric I used

x$Sample<-as.character(x$Sample)
x$Tribe<-as.character(x$Tribe)
x$Breath.d13C<-as.numeric(as.character(x$Breath.d13C))

The dput above is after having done this.

Moving on, I usually produce these graphs using the following code, which I found here:

library(dplyr)
library(ggplot2)
is_outlier <- function(x) {
          return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
        }

        x %>%
          group_by(Tribe) %>%
          mutate(outlier = ifelse(is_outlier(Breath.d13C), Sample, as.numeric(NA))) %>%
          ggplot(., aes(x=factor(Tribe), y = Breath.d13C)) +
            geom_boxplot() +
            geom_text(aes(label = outlier), na.rm = TRUE, hjust = -0.3)+
          scale_y_continuous(name=expression(delta^13*C["Breath"]*" "("\u2030")),
                             limits=c(-25,-10),
                             breaks=seq(-25, -10,1),
                             )

1) My first problem was that the outlier code couldn't cope with NA's (although I thought the as.numeric (NA)was supposed to deal with this, it didn't work). I tried to input na.rm or na.omit at several points in mutate without success. I ended up running x<-filter(x,!is.na (Breath.d13C)) a priori but would prefer a more elegant solution.

When I then run the code it gives me #Error: incompatible size (4), expecting 3 (the group size) or 1

2)Tried to search for similar errors and mutate_ seems to be the recommended solution. Tried it and the code did run but it no longer labelled the outliers. Any idea why?

Sorry if this is turns out to be very simple but I'm still a beginner in the R world.

Community
  • 1
  • 1
answer42
  • 63
  • 1
  • 7
  • Running this line at the start of your code may fix the first issue (characters becoming factors) `options(stringsAsFactors=FALSE)` – Jeroen Boeye Aug 03 '16 at 14:18
  • I can't reproduce the error. In the example dataset there appear to be no outliers so the outlier column is all `NA`. If I add obvious outliers the function works fine in `mutate`. To deal with NA values in `is_outlier` you can add `na.rm = TRUE` to all instances of `quantile` and `IQR`. – aosmith Aug 03 '16 at 14:37
  • @JeroenBoeye and @aosmith Thank you! Both of your comments really helped. The only other thing I had to do was wrapping `ifelse` in `as.character` as suggested [here](http://stackoverflow.com/questions/29224719/dplyr-error-strange-issue-when-combining-group-by-mutate-and-ifelse-is-it-a-b). Otherwise it would give me `Error: incompatible types, expecting a character vector` @aosmith The lack of outliers is my fault (I randomly sampled my data for example and forgot to check if it had outliers). Sorry about that. – answer42 Aug 03 '16 at 15:41

0 Answers0