-3

I wanted to make some dotplots using ggplot2, but I need to specify the color for my data and label some of them.

Here is part of my data set looks like:

FactorA   Gene   P-value      logFC
  a        A       0.01         2
  a        B       0.07         3
  b        A       0.05        -1
  b        B       0.03        -0.5

So what I want is

  • If my P-value > 0.05, then the dot is grey color,
  • If P-value < 0.05 and logFC>0, the dot is red color, and
  • If P-value < 0.05 and logFC < 0, the dot is green color.

Then I also want the dots look like circle with black outline and fill as above. Then I only want to label the Genes with P-value <0.05. Then I want my dotplots facet_wrap by FactorA.

How should I specify these in ggplot2?

albert
  • 8,285
  • 3
  • 19
  • 32
Crystal
  • 1
  • 2
  • 6
  • 1
    New column: Pflag according to what you specified above. Then map colours to your flag variables. For labels, `geom_text()`, defining your label in the aesthetics (`aes()`), including the [conditional label][1] you desire (P-value < 0.05). [1]: http://stackoverflow.com/a/15625149/4718512 – oshun Sep 10 '15 at 22:42
  • Dashes in column names do not work well in R. You should post a text file and input statements if you want answers in what might be called "standard R". – IRTFM Sep 10 '15 at 22:43
  • Can you post the `ggplot2` code that you've tried so far? – eipi10 Sep 10 '15 at 23:22

1 Answers1

1

This works for me

df$new <- ifelse(df$Pvalue > 0.05, "grey",
    ifelse(df$logFC > 0, "red", "green"))
library(ggplot2)
q <- qplot(Pvalue, logFC, data = df, shape=new, fill=new, colour=new)
q <- q + scale_shape_manual(values=c(21,21,21)) 
q <- q + scale_fill_manual(values=c("green", "grey", "red")) + scale_colour_manual(values=c("black", "black", "black")) 
q <- q + geom_text(aes(label=ifelse(Pvalue > 0.05 ,as.character(FactorA),'')),hjust=0,just=0)
q + facet_wrap( ~ FactorA, ncol=2)

Credit to @oshun for the conditional geom_text

Alternatively, using ggplot() instead of qplot():

library(ggplot2)
g <- ggplot(df, aes(x = Pvalue, y = logFC, fill = new) +
         geom_point(color = "black", shape = 21) + 
         scale_fill_manual(values = c("green", "grey", "red")) + 
         geom_text(aes(label = ifelse(Pvalue > 0.05,
                                      as.character(FactorA), '')),
                   hjust = 0) +
         facet_wrap(~ FactorA, ncol = 2)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
PereG
  • 1,796
  • 2
  • 22
  • 23
  • Added a `ggplot` edit... I think things are complicated enough here it's nice to get the full set up. In that version, I consolidated the constant color and shape into the `geom` so that manual scales don't need to be defined. If you'd prefer I can post as a separate answer, but it didn't seem substantially different from yours. – Gregor Thomas Sep 11 '15 at 00:01
  • Thanks for sharing your improvement! – PereG Sep 11 '15 at 07:36
  • When I run df$new = ifelse function, the following showed up:Error in `$<-.data.frame`(`*tmp*`, "new", value = logical(0)) : replacement has 0 rows, data has 6 – Crystal Sep 11 '15 at 16:11
  • If we had your own data we could reproduce the problem. This type of error can be due to various reasons. – PereG Sep 11 '15 at 19:53
  • My data is exactly same format as it shown in the question. Do I need to add a column named "new" in my dataset? – Crystal Sep 14 '15 at 16:29
  • The new column is created with the same line of code, is not necessary to prepare... Do you write **df$new** <- ...? – PereG Sep 14 '15 at 16:37
  • Yes. I ran that df$new = ifelse...., and then got that error. I read my table as csv format. Will this be the problem? – Crystal Sep 14 '15 at 16:41
  • Use df <- read.csv(...) is not a problem "per se". Better if you tell us the result of str(df) session info and so on. And of course, a sample of data. Check comment #2 in your question. – PereG Sep 14 '15 at 16:47
  • Yes, I think there is a problem on read my data. I used str(df), and only have 6 observations and 0 variables. The code I used to read my table is : test = read.csv("test.csv", header=TRUE, row.names=1, sep="\t") – Crystal Sep 14 '15 at 16:50
  • Well, actually I think I should get rid of the row.names=1 to solve my problem. And now str(df) is normal, two factors and two numeric variables. – Crystal Sep 14 '15 at 16:54
  • Yes, you can with the parameter `cex`. But to ask something else you should open a new issue (when you code does not work) ;) – PereG Sep 14 '15 at 17:23