Plotting multiple line graphs in one using R

Question

I have a dataset with the number of reads for more than 3000 organisms obtained on 3 different stages of experiment. The data looks something like this:

          rain      day0      day7
 org1     923857    505062    503292
 org2     424002    198440    26314
 org3     2910      1492      535

...with 3000 more rows trailing this.

I want to plot the trend (number of reads) for each organism across different stages.. (start, day0, day7). Each organism should be represented with a different color and all should be in the same plot.

I have tried doing the same in excel but it has a limit of only 255 such lines in a single plot.

The plot I obtained in excel:

Is there a way of doing this in R? I am new to R and therefore don't know much. I think ggplot might work but I'm having hard time understanding how to use it on this data.

Any help is greatly appreciated. Thanks.

What is the name of the column containing the organism name? Have you written code to load your data into R? — Jack Brookes, Jan 21 '19 at 01:10
The column was initially named 'names'. but then I used that column as the row names. I can revert it back if that helps. And yes I have the data loaded. — user_14, Jan 21 '19 at 01:29
I know this doesn't answer your actual question, but do you have the option to make a different kind of graph? I'm wondering how well a human could distinguish 3,000 different colors on the same graph. What about, say, a scatterplot of `reads_in_day0 - reads_in_rain` against `reads_in_day7 - reads_in_rain` instead? — A. S. K., Jan 21 '19 at 06:41

score 0 · Answer 1 · answered Jan 21 '19 at 01:48

0

Here is a version using library(tidyverse)

I created a data.frame based upon the data you provided,
gather these variables to put the data into a long format,
changed the factor levels so that they align with the plot you provided,
and used ggplot to produce a figure.

data.frame(org = letters[1:3],
           rain = c(923857, 424002, 2910),
           day0 = c(505062, 198440, 1492),
           day7 = c(503292, 2614, 535)) %>% 
  gather(variable, value, -org) %>% 
  mutate(variable = factor(variable, levels = c('rain', 'day0', 'day7'))) %>% 
  ggplot(aes(variable, value, color = org, group = org)) + 
  geom_point() +
  geom_line() +
  theme(legend.position="bottom")

answered Jan 21 '19 at 01:48

B Williams

1,992
12
19

Hi Thanks for the answer. This works well when I have a much smaller dataset. But I extrapolated this same code to work on the original >3000 rows dataset, and its not showing any errors but its still running after 30 min.... No plots yet. Do you think there is any time-efficient way of doing this? Or should I consider more pre-processing of the data? – user_14 Jan 21 '19 at 03:31
3000 rows isn't a problem - likely has to do with how you are "extrapolating" - how are you doing this? – B Williams Jan 21 '19 at 04:12
I have used the columns from my existing df (called f) for the corresponding data in org, rain, day0 and day7. So essentially, instead of using org = letter[1:3], rain = c(....), and so on, I have used ... org = f[,1], rain = f[,2] and so on... the rest I have kept the same. – user_14 Jan 21 '19 at 04:29
pretty difficult to assist without the actual data in hand - see here https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for some advice to making it easier for others to help you – B Williams Jan 21 '19 at 04:47
The actual data is the exact same data I shared in the question with 3000 more rows representing 3000 more organisms. I think since I'm using RStudio, it took a lot of time to process the data and plot. RStudio is slow when it comes to plotting. And also I was not able to view the actual plot since the legend took up all the space. So, I removed the legend and was able to view that the plot actually works, but I am still going to do a little more preprocessing of the data so it becomes more legible. Thanks a lot for your answer. This really helped. – user_14 Jan 21 '19 at 05:13

Plotting multiple line graphs in one using R

1 Answers1