0

I know Stack Overflow is not a code writing service, but I am really stuck with this one and I have no clue how I can draw a map like this:

enter image description here

Where the color code is based on the p-value; the smaller the p-value, the brighter the color. The size of the dot is determined by the percentage overlap.

I have data of 3 samples, like this:

            Sample1                  Sample2                  Sample3   
Description percentage    p-value    Percentage    p-value    Percentage    p-value
Trendy      0.1585        0          0.1646        1.11E-016  0.2397        6.41E-014
nonTrendy   0.219         5.55E-016                           0.2203        9.84E-012
Specific    0.1713        9.99E-016  0.162         2.74E-011  0.1838        1.73E-012
nonspecific 0.2119        3.02E-013  0.1356        0.0000613  0.2044        1.1E-011
Robotics    0.1632        7.85E-013  0.1263        0.00000361 0.2158        0
human       0.2533        7.25E-012  0.1733        0.0000218  0.2069        4.16E-008

For each sample I have a percentage overlap(yes this percentage has not been multiplied by 100 so it is on scale of 1) and a p-value.

Also, few samples might have missing values (both for percentage and p-value). This happens due to absence of significant overlaps, as in the case of sample2 of nonTrendy.

Please help me getting a figure like the one in attachment.

sodd
  • 12,482
  • 3
  • 54
  • 62
Angelo
  • 4,829
  • 7
  • 35
  • 56
  • 1
    This is probably a `ggplot2` plot made in R, with `geom_point` and mapping the `size` and `color` (and, of course, `x` and `y`) aesthetics. Search and you'll find :-) (Why don't you ask the author of the article you found this graph in, if she would share the relevant code with you?) – krlmlr May 23 '13 at 08:01
  • The email just bounces back :( – Angelo May 23 '13 at 08:03
  • Poor luck. But still, this graph is easy enough to construct. Remember to use factors for the x and y scales to get the right ordering along the axes, and `theme_bw()` to get rid of the default gray background. You might also need the `reshape` package to, well, reshape your data for usage in `ggplot2`. – krlmlr May 23 '13 at 08:07
  • Why a vote to close this question?????????????? Baffling – Angelo May 23 '13 at 08:11
  • I didn't vote, but I assume this question is too specific to attract general interest. Also, the data could be formatted better (http://stackoverflow.com/q/5963269/946850), and the Python tag doesn't quite fit. These are also the reasons why I comment instead of answering your question. – krlmlr May 23 '13 at 08:15
  • Well python tag was for numpy libraries, that can be used. – Angelo May 23 '13 at 08:16
  • There are not enough values in the `nonTrendy` row. – Sven Hohenstein May 23 '13 at 08:40
  • @Sven: sample 2 in non trendy does not have any value that is why its values are missing, in R I can put NaN but I don't know what is the replacement in python or in other languages. Thank you – Angelo May 23 '13 at 08:45
  • @Angelo You could use `np.nan` from the numpy package, or alternatively `None`, although I'd recommend the first one. – sodd May 23 '13 at 08:51
  • @nordev Thank you, but the basic question is still unsolved how can i plot it I was looking into ggplot2 but still no breakthrough from my end – Angelo May 23 '13 at 08:54
  • If you provide the community with some good data, which we can readily paste into our own R session, you'll get an answer soon enough. See also http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. – Paul Hiemstra May 23 '13 at 08:58
  • @Angelo Because it included an error. Now it's correct. – Sven Hohenstein May 23 '13 at 09:20

1 Answers1

1

The following script creates a plot in R. It does not exactly look like your example plot, but it can be modified.

text <- "Sample1     Sample2     Sample3     
Description percentage  p-value Percentage  p-value Percentage  p-value
Trendy  0.1585  0   0.1646  1.11E-016   0.2397  6.41E-014
nonTrendy   0.219   5.55E-016   NA     NA   0.2203  9.84E-012
Specific    0.1713  9.99E-016   0.162   2.74E-011   0.1838  1.73E-012
nonspecific 0.2119  3.02E-013   0.1356  0.0000613   0.2044  1.1E-011
Robotics    0.1632  7.85E-013   0.1263  0.00000361  0.2158  0
human   0.2533  7.25E-012   0.1733  0.0000218   0.2069  4.16E-008"

Note. Two NAs were added to the data.

lines <- readLines(textConnection(text), 8)
strings <- strsplit(lines, " +")
sam <- strings[[1]]
des <- unlist(lapply(strings[-1], "[", 1))
coln <- sub("-", "", strings[[2]][-1][1:2])
val <- do.call(rbind, lapply(strings[-(1:2)], function(x) as.numeric(x[-1])))

perc <- as.vector(val[ , as.logical(seq(ncol(val)) %% 2)])
pval <- as.vector(val[ , !seq(ncol(val)) %% 2])

dat <- setNames(data.frame(des[-1], perc, pval), c(des[1], coln))
dat$sample <- rep(sam, each = nrow(val))

library(ggplot2)
ggplot(dat, aes(colour = pvalue, size = percentage, 
                x = sample, y = Description)) +
  geom_point() + 
  theme_bw()

enter image description here

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • last question, instead of space separated data(as in example), how can I deal with tab separated data (is it possible to read the data through a tab separated file and plot the same). Thank you – Angelo May 23 '13 at 09:49
  • @Angelo It should work if you replace `" +"`in the `strsplit` function with `"\t"`. – Sven Hohenstein May 23 '13 at 11:05
  • Thank you, it did work. One more How can I change the colour code to red instead of blue? – Angelo May 23 '13 at 11:10
  • @Angelo Add `scale_colour_continuous(high = "red")` to the plot. – Sven Hohenstein May 23 '13 at 11:12
  • Definitely last one :) Sorry to bother you again, is their a way to reduce the white spaces in figure. Actually a lot of wgite spaces are spoiling the party :( – Angelo May 23 '13 at 11:43
  • @Angelo You can decrease the size of the figure (it's easy to do it in the plot window). – Sven Hohenstein May 23 '13 at 11:47
  • Is their a way to block the ggplot from sorting the data but plotting as the entered order of data. – Angelo May 23 '13 at 13:10
  • @Angelo You can manually specify the order of the factor levels with the `factor` function. This has to be done before plotting. For example, `myfactor <- factor(myfactor, levels = c("a", "c", "d", "b")`. – Sven Hohenstein May 23 '13 at 13:12
  • How do I incorporate it in this code so that it takes the order of input data and no sorting is done. Kindly help – Angelo May 23 '13 at 13:44
  • 1
    @Angelo `dat$Description <- factor(dat$Description, levels = rev(c("Trendy", "nonTrendy", "Specific", "nonspecific", "Robotics", "human"))) ` – Sven Hohenstein May 23 '13 at 13:58