0

I would like to add another level of complexity to the color coding scheme I have going on in the below plot. I want to account for whether each of the values being plotted has passed a statistical test. So, the dots will only be color coded based on the percentile if they pass the test, otherwise, I would like the dot to be grey.

Here is my code as I have it after all the helpful suggestions I received from my first post Color code points based on percentile in ggplot (note: this is some made up data, though I have real data which has many more entries:

dat <- data.frame(key = c("a1-a3", "a1-a2"), position = 1:100, fst = rlnorm(200, 0, 1), fet = rnorm(200, 0.24, 0.54))

#Get quantiles
quants <- quantile(dat$fst, c(0.95, 0.99))

dat$quant  <- with(dat, factor(ifelse(fst < quants[1], 0,
                                  ifelse(fst < quants[2], 1, 2))))

dat$fisher <- with(dat, factor(ifelse(fet > 1.30102999566398, 0, 1)))

dat$col <- with(dat, factor(ifelse(fet < 1.30102999566398, 3, quant)))

########theme set
theme_set(theme_bw(base_size = 10))

p1 <- ggplot(dat, aes(x=position, y=fst)) +
  geom_point(aes(colour = col, size=0.2)) +
  facet_wrap(~key, nrow = 1) +
  scale_colour_manual(values = c("black", "blue", "red", "grey"), labels = c("0-95", "95-99", "99-100", "fail")) +
  ylab(expression(F[ST])) +
  xlab("Genomic Position (Mb)") +
  scale_x_continuous(breaks=c(0, 1e+06, 2e+06, 3e+06, 4e+06), labels=c("0", "1", "2", "3", "4")) +
  scale_y_continuous(limits=c(0,1)) +
  theme(plot.background = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    legend.position="none",
    legend.title = element_blank()
    )

tiff(Fstvalues_colourcode3.tiff", height=2.5, width=6.5, units="in", res = 300, pointsize="10")
p1
dev.off()

My problem is in the line: dat$col <- with(dat, factor(ifelse(fet < 1.30102999566398, 3, quant))). I want it to use the value from the $quant if it has an $fet value above the above listed value (or fisher == 0), and if it has an $fet value below, I would like it to make a new factor (3). When I look at the data frame it is doing something different than this. Any comments/suggestions are much appreciated! (I'm pretty new to coding and am finding factors are not easy to work with!!)

Community
  • 1
  • 1
ONeillMB1
  • 343
  • 6
  • 19

1 Answers1

2

Yes, your are right, with(dat, factor(ifelse(fet < 1.30102999566398, 3, quant))) gives an 'unexpected' result. Your no return value in ifelse, the factor quant, is coerced to the same class as the yes return value (3), a numeric. Have a look at tail(dat[order(dat$fet), c("fet", "quant", "col")]):

#          fet quant col
# 6   1.202582     0   3
# 40  1.318997     0   1
# 74  1.324552     0   1
# 24  1.415189     1   2
# 38  1.418230     0   1
# 123 1.531584     0   1 

For fet > 1.301 (the test in ifelse), 'col' became 1, 1, 2, 1, 1, instead of 0, 0, 1, 0, 0. Something like this happened:

# original factor version of quant
quant <- as.factor(0:2)
quant
# [1] 0 1 2
# Levels: 0 1 2

# coerce quant to numeric
as.numeric(quant)
# [1] 1 2 3

Compare these two:

set.seed(1)
df <- data.frame(fet = rnorm(9), quant = factor(0:2))
str(df)
df$col <- with(df, ifelse(fet < 0, 3, quant))
df

set.seed(1)
df <- data.frame(fet = rnorm(9), quant = 0:2)
str(df)
df$col <- with(df, ifelse(fet < 0, 3, quant))
df

Thus, try to remove factor from your ifelse call where you create 'quant' and see if it solves the problem.

See also 8.2.1 here: http://www.burns-stat.com/pages/Tutor/R_inferno.pdf‎.

PS. As you phrase your question, it is the single ifelse line is your actual problem (not the plotting part). If so, you may wish to isolate this problem and condense your question.

Henrik
  • 65,555
  • 14
  • 143
  • 159
  • This is super helpful, but it is still behaving in an unpredictable way. Teaching duties, lab work and fellowship apps have kept me from playing with the code in a while, but I will take a look at it again this week and write an update and try to get to the bottom of it. I can only seem to get one color scheme or the other working at a time on the real data. – ONeillMB1 Nov 11 '13 at 05:36