Plotting each column of a dataframe as one line using ggplot

Question

The whole dataset describes a module (or cluster if you prefer).

In order to reproduce the example, the dataset is available at: https://www.dropbox.com/s/y1905suwnlib510/example_dataset.txt?dl=0

(54kb file)

You can read as:

test_example <- read.table(file='example_dataset.txt')

What I would like to have in my plot is this

On the plot, the x-axis is my Timepoints column, and the y-axis are the columns on the dataset, except for the last 3 columns. Then I used facet_wrap() to group by the ConditionID column.

This is exactly what I want, but the way I achieved this was with the following code:

plot <- ggplot(dataset, aes(x=Timepoints))
plot <- plot + geom_line(aes(y=dataset[,1],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,2],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,3],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,4],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,5],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,6],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,7],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,8],colour = dataset$InModule))
...

As you can see it is not very automated. I thought about putting in a loop, like

columns <- dim(dataset)[2] - 3
for (i in seq(1:columns))
{
  plot <- plot + geom_line(aes(y=dataset[,i],colour = dataset$InModule))
}
(plot <- plot + facet_wrap(  ~ ConditionID, ncol=6) )

That doesn't work. I found this topic Use for loop to plot multiple lines in single plot with ggplot2 which corresponds to my problem. I tried the solution given with the melt() function.

The problem is that when I use melt on my dataset, I lose information of the Timepoints column to plot as my x-axis. This is how I did:

data_melted <- dataset
as.character(data_melted$Timepoints)
dataset_melted <- melt(data_melted)

I tried using aggregate

aggdata <-aggregate(dataset, by=list(dataset$ConditionID), FUN=length)

Now with aggdata at least I have the information on how many Timepoints for each ConditionID I have, but I don't know how to proceed from here and combine this on ggplot.

Can anyone suggest me an approach. I know I could use the ugly solution of creating new datasets on a loop with rbind(also given in that link), but I don't wanna do that, as it sounds really inefficient. I want to learn the right way.

Thanks

To convert your data into long format (using e.g. `melt`) is the standard `ggplot` way here I would say. Please provide a **minimal, self contained example** (see e.g. [**here**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610)) and show your attempts using `melt`. — Henrik, Nov 19 '14 at 14:27
Suggest you to post sample data in the Q directly such that folks here could test before putting forward their solution. — KFB, Nov 19 '14 at 14:28
thanks for the feedback. I added the data now for reproducibility. — Rafael Santos, Nov 19 '14 at 15:22

score 3 · Accepted Answer · answered Nov 19 '14 at 15:59

3

You have to specify id.vars in your call to melt.data.frame to keep all information you need. In the call to ggplot you then need to specify the correct grouping variable to get the same result as before. Here's a possible solution:

melted <- melt(dataset, id.vars=c("Timepoints", "InModule", "ConditionID"))
p <- ggplot(melted, aes(Timepoints, value, color = InModule)) +
  geom_line(aes(group=paste0(variable, InModule)))
p

answered Nov 19 '14 at 15:59

shadow

21,823
4
63
77

Thanks! This solved my problem. However I'm a bit confused. Why when I don't specify id.vars, melt automatically kept almost all the columns I needed except for the last one? What is the criteria here? Is it because it recognized every column as numeric, and then finally found columns that were factor and assumed that all the columns until that point were the correct one? Also, the way you did you are saying that Timepoints column work as id, which is not true. They are somehow values like the other columns, but only make sense when grouped by each condition they represent – Rafael Santos Nov 19 '14 at 17:01

Plotting each column of a dataframe as one line using ggplot

1 Answers1

Linked

Related