-4

I have this dataframe called mydf. I have gene column with the unique gene names and the searched_ (searched that genes in these many individuals) and found_ (found in these many individuals) columns with these genes. I want to plot a graph, but not sure what would be the best way to do in R. I would like to see searched bars and the found bars sitting on top of each other. Is that possible to get?

     mydf<-structure(c("FLT3-TKD", "DNMT3A", "IDH1", "190", "0", "190", 
"5.26315789473684", "NaN", "6.8421052631579", "186", "0", "188", 
"4.83870967741935", "NaN", "7.97872340425532", "123", "0", "123", 
"7.31707317073171", "NaN", "8.13008130081301"), .Dim = c(3L, 
7L), .Dimnames = list(NULL, c("gene", "searched_man", "found_man", 
"searched_cat", "found_cat", "searched_goat", "found_goat")))
MAPK
  • 5,635
  • 4
  • 37
  • 88
  • In general, ggplot2 works better with [tidy data](http://vita.had.co.nz/papers/tidy-data.html). Formatting your data into tidy data will also make it easier for people to help you. – CPhil Feb 07 '16 at 09:21

1 Answers1

1

A base R solution:

Read in data

Using your code, I ended up with a matrix full of characters. That's no good.

mydf <- as.data.frame(mydf)
mydf[, -1] <- lapply(mydf[, -1], function(x) as.numeric(as.character(x)))
str(mydf)

Get it into long format

mydf2 <- data.frame(gene = mydf$gene,
                    animal = rep(c('man', 'cat', 'goat'), each = nrow(mydf)),
                    searched = unlist(mydf[, seq(2, ncol(mydf) - 1, 2)]),
                    found = unlist(mydf[, seq(3, ncol(mydf), 2)]),
                    row.names = NULL)

Gives:

      gene animal searched    found
1 FLT3-TKD    man      190 5.263158
2   DNMT3A    man        0      NaN
3     IDH1    man      190 6.842105
4 FLT3-TKD    cat      186 4.838710
5   DNMT3A    cat        0      NaN
6     IDH1    cat      188 7.978723
7 FLT3-TKD   goat      123 7.317073
8   DNMT3A   goat        0      NaN
9     IDH1   goat      123 8.130081

An example of a plot would be (You gave zero information about what you wanted to see):

library(ggplot2)
ggplot(mydf2, aes(x = animal, y = found / searched)) + 
  geom_bar(stat = 'identity') +
  facet_wrap(~gene)

enter image description here

Another example then:

mydf2$not_found <- mydf2$searched - mydf2$found
mydf3 <- tidyr::gather(mydf2, 'type', 'val', found:not_found)

ggplot(mydf3, aes(x = animal, y = val, fill = type)) + 
  geom_bar(stat = 'identity') +
  facet_wrap(~gene)

enter image description here

Community
  • 1
  • 1
Axeman
  • 32,068
  • 8
  • 81
  • 94
  • Thanks, but it does not show the `found` and `searched` group of data? – MAPK Feb 07 '16 at 10:02
  • I would like to see searched bars and the found stats sitting on top of each searched bar. Is that possible to get? – MAPK Feb 07 '16 at 10:03
  • It shows the proportion of found. Again, I knew little about your data and only stated that you wanted 'a graph'. You want a stacked bargraph? – Axeman Feb 07 '16 at 10:04
  • Category wise, your graph looks good, but I also wanted to have found and searched data to be represented. – MAPK Feb 07 '16 at 10:06
  • Thanks a lot for your help. – MAPK Feb 07 '16 at 12:11
  • How do you put actual numbers in the second plot? – MAPK Feb 07 '16 at 12:14
  • 1
    Maybe search first. For example: http://stackoverflow.com/questions/6455088/how-to-put-labels-over-geom-bar-in-r-with-ggplot2 – Axeman Feb 07 '16 at 12:15