0

I have a dataframe, df,similar to the following:

Time  Sample_A Sample_B Sample_C
 0      0.12     0.14     0.15
 1      0.13     0.20     0.21
 2      0.31     0.34     0.36

I am reading in this data from a text file, in which the number of columns will always be changing. I would like to use ggplot in order to quickly and easily graph the x value (always Time) by all of the y values (Sample A, B, C, ....) onto a single graph. The names of the Y-variables are always changing as well.

In essence, I'd like to avoid doing the following on repeat:

ggplot(df, aes(x = Time, y = Sample_A) + geom_line()
ggplot(df, aes(x = Time, y = Sample_B) + geom_line()

I have tried to create a vector that contains all names of the columns and apply that as the Y-values to the aes function, however it returns the number of variables, rather than the values within the variables.

What is the most efficient way to go about this?

KVSEA
  • 109
  • 8
  • 1
    You should take a look first at the ggplot docs and the tutorials they link to. The ggplot paradigm is to work with long-shaped data so you can assign variables to visual encodings such as color. You also want to give column names as bare names, not strings – camille Jan 28 '20 at 19:14
  • I didn't mean to put those as strings, my fault. They have been edited accordingly. – KVSEA Jan 28 '20 at 19:18
  • It provides insight into manually graphing a few variables, however I will likely be in the 20-30 variable range and would rather not graph 20-30 variables individually. – KVSEA Jan 28 '20 at 19:21
  • 1
    Those answers (other than the accepted one) should scale to any number of columns. That's the point of reshaping the data. But as an aside, are you sure 20 or 30 lines in one chart will be legible? – camille Jan 28 '20 at 19:22

2 Answers2

2

This is pretty simple:

library(tidyverse)

df <- tibble(
  time = c(0, 1, 2),
  Sample_A = c(0.12, 0.13, 0.31),
  Sample_B = c(0.14, 0.20, 0.34),
  Sample_C = c(0.15, 0.21, 0.36)
)

df %>% 
  gather(key = sample, value = value, -time) %>% 
  ggplot(aes(x = time, y = value, color = sample)) +
  geom_line()

Basically, you can gather all of the columns except the first into a "long" data frame instead of a "wide" one. Then a couple lines of ggplot code will plot the result, colored by sample.

enter image description here

cardinal40
  • 1,245
  • 1
  • 9
  • 11
0

Use lapply to render a geom_line that loops over the columns like this:

ggplot(data) +
  lapply(names(data)[2:length(data)], FUN = function(i) geom_line(aes_string(x = time, y = i)))
David Jorquera
  • 2,046
  • 12
  • 35