ggplot: plot two columns of data frame

Question

here is a sample dataset to explain the question:

s <- 
"F V1  V2  P
0 0.5 0.7  0
0 1.5 1.7  1
1 0.7 0.9  0
1 1.7 1.9  1
"
d <- read.delim(textConnection(s), sep="")

I want to plot this data in one plot using ggplot such that:

on x - axis I have P
on Y - axis I have both V1 (triangles) and V2 (squares)
points with F = 0 are red and points with F = 1 are blue.

That is, I want to plot two columns on the data frame with different markers such that color of every point is defined by F.

Thanks.

EDIT: I believe it is not duplicate question -- in mentioned answer the data frame is melted. But in my case when I melt, I also lose F column which defines color so that solution doesn't work.

Much of the time in ggplot2 you will want your data in a "long" format instead of a "wide" format. See examples, e.g., [here](https://stackoverflow.com/questions/9531904/plot-multiple-columns-on-the-same-graph-in-r) and [here](https://stackoverflow.com/questions/4877357/how-to-plot-all-the-columns-of-a-data-frame-in-r). — aosmith, Oct 05 '18 at 17:09

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

There are two options, here:

As there are only two value columns, they can be plotted by separate calls to geom_point(). This is not recommended in general, will not produce an appropriate legend, but gives a quick answer.
The recommended way for ggplot2 is to reshape the value columns from wide to long format (thereby using F and P as id variables, so the color indicator F isn't lost).

1. Plot data in wide format

library(ggplot2)
g <- ggplot(d, aes(factor(P), color = factor(F))) + 
  geom_point(aes(y = V1), shape = "triangle") +
  geom_point(aes(y = V2), shape = "square")
g

With some polishing

g +
  ylab("V1, V2") +
  xlab("P") +
  scale_colour_manual(name = "F", values = c("red", "blue"))

Note that both F and P are turned explicitely into discrete variables.

2. Plot data in long format

library(reshape2)
# reshape data from wide to long format
long <- melt(d, c("F", "P"))
g <- ggplot(long, aes(factor(P), value, shape = variable, color = factor(F))) + 
  geom_point()
g

With some polishing:

g +
  xlab("P") +
  scale_colour_manual(name = "F", values = c("red", "blue")) +
  scale_shape_manual(values = c("triangle", "square"))

When reshaping from wide to long format it is important to specify which variables are id variables which will be repeated in every row and which are the measure variables which will constitute the values column in the long format

So,

melt(d, c("F", "P"))

and

melt(d, measure.vars = c("V1", "V2"))

produce the same result:

  F P variable value
1 0 0       V1   0.5
2 0 1       V1   1.5
3 1 0       V1   0.8
4 1 1       V1   1.7
5 0 0       V2   0.7
6 0 1       V2   1.8
7 1 0       V2   0.9
8 1 1       V2   1.9

(For the sake of completeness, the data.table version of melt() understands pattern matching on column names, e.g., melt(d, measure.vars = patterns("V")).)

score 1 · Answer 2 · answered Oct 06 '18 at 02:11

Instead of reshape2::melt, tidyr::gather might be good alternative to it. You just specify variables what to gather as select in dplyr, and make its new name to key argument. value argument is for corresponding value's name.

Here, not to lose F: gather(-P, -F, key = "V", vlaue = "value")

    s <- 
    "F V1  V2  P
    0 0.5 0.7  0
    0 1.5 1.7  1
    1 0.7 0.9  0
    1 1.7 1.9  1
    "
    d <- read.delim(textConnection(s), sep="")
    library(tidyverse)
    library(ggplot2)
    d %>%
      rename(f = F) %>% # just not to confuse with FALSE
      gather(-P, -f, key = "V", value = "value") %>% # tidyr::gather
      ggplot(aes(x = P, y = value, shape = V, color = factor(f))) +
      geom_point() +
      geom_line() +
      scale_color_manual(name = "F", values = c("0" = "red", "1" = "blue")) +
      scale_shape_manual(name = "V", values = c("V1" = 2, "V2" = 0))

ggplot: plot two columns of data frame

2 Answers2

1. Plot data in wide format

2. Plot data in long format