-1

here is a sample dataset to explain the question:

s <- 
"F V1  V2  P
0 0.5 0.7  0
0 1.5 1.7  1
1 0.7 0.9  0
1 1.7 1.9  1
"
d <- read.delim(textConnection(s), sep="")

I want to plot this data in one plot using ggplot such that:

  • on x - axis I have P
  • on Y - axis I have both V1 (triangles) and V2 (squares)
  • points with F = 0 are red and points with F = 1 are blue.

That is, I want to plot two columns on the data frame with different markers such that color of every point is defined by F.

Thanks.

EDIT: I believe it is not duplicate question -- in mentioned answer the data frame is melted. But in my case when I melt, I also lose F column which defines color so that solution doesn't work.

ivan
  • 311
  • 1
  • 4
  • 13
  • 2
    Much of the time in ggplot2 you will want your data in a "long" format instead of a "wide" format. See examples, e.g., [here](https://stackoverflow.com/questions/9531904/plot-multiple-columns-on-the-same-graph-in-r) and [here](https://stackoverflow.com/questions/4877357/how-to-plot-all-the-columns-of-a-data-frame-in-r). – aosmith Oct 05 '18 at 17:09

2 Answers2

3

There are two options, here:

  1. As there are only two value columns, they can be plotted by separate calls to geom_point(). This is not recommended in general, will not produce an appropriate legend, but gives a quick answer.
  2. The recommended way for ggplot2 is to reshape the value columns from wide to long format (thereby using F and P as id variables, so the color indicator F isn't lost).

1. Plot data in wide format

library(ggplot2)
g <- ggplot(d, aes(factor(P), color = factor(F))) + 
  geom_point(aes(y = V1), shape = "triangle") +
  geom_point(aes(y = V2), shape = "square")
g

enter image description here

With some polishing

g +
  ylab("V1, V2") +
  xlab("P") +
  scale_colour_manual(name = "F", values = c("red", "blue"))
  

enter image description here

Note that both F and P are turned explicitely into discrete variables.

2. Plot data in long format

library(reshape2)
# reshape data from wide to long format
long <- melt(d, c("F", "P"))
g <- ggplot(long, aes(factor(P), value, shape = variable, color = factor(F))) + 
  geom_point()
g

enter image description here

With some polishing:

g +
  xlab("P") +
  scale_colour_manual(name = "F", values = c("red", "blue")) +
  scale_shape_manual(values = c("triangle", "square"))

enter image description here

When reshaping from wide to long format it is important to specify which variables are id variables which will be repeated in every row and which are the measure variables which will constitute the values column in the long format

So,

melt(d, c("F", "P"))

and

melt(d, measure.vars = c("V1", "V2"))

produce the same result:

  F P variable value
1 0 0       V1   0.5
2 0 1       V1   1.5
3 1 0       V1   0.8
4 1 1       V1   1.7
5 0 0       V2   0.7
6 0 1       V2   1.8
7 1 0       V2   0.9
8 1 1       V2   1.9

(For the sake of completeness, the data.table version of melt() understands pattern matching on column names, e.g., melt(d, measure.vars = patterns("V")).)

Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
1

Instead of reshape2::melt, tidyr::gather might be good alternative to it. You just specify variables what to gather as select in dplyr, and make its new name to key argument. value argument is for corresponding value's name.

Here, not to lose F: gather(-P, -F, key = "V", vlaue = "value")

    s <- 
    "F V1  V2  P
    0 0.5 0.7  0
    0 1.5 1.7  1
    1 0.7 0.9  0
    1 1.7 1.9  1
    "
    d <- read.delim(textConnection(s), sep="")
    library(tidyverse)
    library(ggplot2)
    d %>%
      rename(f = F) %>% # just not to confuse with FALSE
      gather(-P, -f, key = "V", value = "value") %>% # tidyr::gather
      ggplot(aes(x = P, y = value, shape = V, color = factor(f))) +
      geom_point() +
      geom_line() +
      scale_color_manual(name = "F", values = c("0" = "red", "1" = "blue")) +
      scale_shape_manual(name = "V", values = c("V1" = 2, "V2" = 0))
younggeun
  • 923
  • 1
  • 12
  • 19