1

I have a large data set with protein IDs and corresponding abundance profiles across a number of gel fractions. I want to plot these profiles of abundances across the fractions.

The data looks like this

IDs<- c("prot1", "prot2", "prot3", "prot4")
fraction1 <- c(3,4,2,4)
fraction2<- c(1,2,4,1)
fraction3<- c(6,4,6,2)
plotdata<-data.frame(IDs, fraction1, fraction2, fraction3)

> plotdata
    IDs  fraction1  fraction2  fraction3
1 prot1          3          1          6
2 prot2          4          2          4
3 prot3          2          4          6
4 prot4          4          1          2

I want it to look like this: a plot made in excel of protein abundances

Every protein has a profile. Every fraction has a corresponding abundance value per protein. I want to have multiple proteins per plot.

I tried figuring out ggplot2 using the cheat sheet and failed. I don't know what the input df should look like and what method I should use to get these profiles.

I would use excel, but a bug draws the wrong profile of my data depending on order of data, so I can't trust it to do what I want.

Joram
  • 139
  • 5
  • 11
  • Data that you provided doesn't have get fraction information – pogibas Jan 26 '18 at 11:25
  • edited for clarification. – Joram Jan 26 '18 at 11:29
  • 1
    Are the different colors in excel the different proteins, or are the different proteins on the x-axis? I might be easier if you describe more clearly how the plot is created or add the output of `dput(yourdata)` to your post. See also here: [how to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – kath Jan 26 '18 at 11:42

1 Answers1

3

First, you'll have to reorganize your data.frame for ggplot2. You can do it one step with reshape2::melt. Here you can change the 'variable' and 'value' names.

library(reshape2)
library(dplyr)
library(ggplot2)
data2 <- melt(plotdata, id.vars = "IDs")

Then, we'll group the data by protein:

data2 <- group_by(data2, IDs)

Finally, you can plot it quite simply:

ggplot(data2) +
    geom_line(aes(variable, value, group = IDs,
                  color = IDs))
csgroen
  • 2,511
  • 11
  • 28
  • 1
    thank you so much, I did not find this anywhere. saved my day. – Joram Jan 26 '18 at 11:48
  • I used your example on my real data and right now the y axis scale is messed up, it doesn't go from 0 to max but instead just orders every occuring value equidistant from the min_value to the max_value. What is the command to make it a continuous scale? Example: https://imgur.com/a/B4nEu – Joram Jan 26 '18 at 12:02
  • 1
    I mean scale_y_continuous – csgroen Jan 26 '18 at 12:08
  • 1
    I also had the problem that my numeric values were stored as characters, which leads to discrete display instead of numeric display. Using transform(df, values = as.numeric(values)) I could change my numbers to be interpreted as numeric which resolved the scale issue. My plot is really beautiful now, thanks again for the help. – Joram Jan 26 '18 at 13:28