How to make XY line plots with ggplot

Question

I have a data that looks like this:

id  description condition   37  40  43  46  49  52  55  58  61  64
A0A0U1RRE5  Negative regulator of P-body association OS=Homo sapiens OX=9606 GN=NBDY PE=1 SV=1  drug    1   1.1041226863016411  0.9437029927040576  1.133936979806517   0.8982803168215154  0.7035497135828203  0.46722020612331583 0.5594182618003255  0.48512236252537605 0.7310106673958117
A0A0U1RRE5  Negative regulator of P-body association OS=Homo sapiens OX=9606 GN=NBDY PE=1 SV=1  untreated   1   0.9450343635662249  0.8363405733910056  0.9532277999468575  0.8581424910783204  0.7917800302889262  0.7745810643326428  0.6048898019322693  0.7380694079960117  0.5978993627096959
A0A1B0GWB2  Proline rich transmembrane protein 1B OS=Homo sapiens OX=9606 GN=PRRT1B PE=3 SV=1   untreated   0.9999999999999999  1.5229192821972406  1.1398828388275444  1.2290176893822775  1.0580728822363383  0.8879357870221047  0.6642457405807493  0.5400754152380155  0.5681676616756307  0.17638645150459156
A0AV96  RNA-binding protein 47 OS=Homo sapiens OX=9606 GN=RBM47 PE=1 SV=2   drug    1   1.1359020588355402  0.7979585326115394  0.5762235578663404  0.3292590168348695  0.20514611170749722 0.15713654146366826 0.09716732237925005 0.09612454080882395 0.099309468934591
A0AV96  RNA-binding protein 47 OS=Homo sapiens OX=9606 GN=RBM47 PE=1 SV=2   untreated   1   0.9229166764455238  0.8433271037981647  0.4533284181354084  0.4555246688979719  0.4143488607289736  0.34695355056189686 0.3770836050953272  0.4779148623352096  0.3373909906384538
A0AVF1  Intraflagellar transport protein 56 OS=Homo sapiens OX=9606 GN=TTC26 PE=1 SV=1  untreated   0.9999999999999999  0.9177779500022252  0.5998866619497241  0.521926454874236   0.2579381653212965  0.1996295914616442  0.14649190575097334 0.07823081931021865 0.027281536990687902    0.05640264437646823
A0AVI2  Fer-1-like protein 5 OS=Homo sapiens OX=9606 GN=FER1L5 PE=2 SV=3    drug    1   1.024105655638753   0.8197303752878325  0.9964535420876113  0.8386928364435612  0.6462570275048929  0.2973443433453433  0.3627837836782134  0.3228359285265014  0.2882232683577441
A0AVI2  Fer-1-like protein 5 OS=Homo sapiens OX=9606 GN=FER1L5 PE=2 SV=3    untreated   1   1.1179353874768374  0.878599850307355   0.7990611170397348  0.7812781943314279  0.47258075106250047 0.4548412063813775  0.38089154715422585 0.39198901572993927 0.32612692511437963
A0AVT1  Ubiquitin-like modifier-activating enzyme 6 OS=Homo sapiens OX=9606 GN=UBA6 PE=1 SV=1   drug    1   0.9073477805444234  0.9245932811239219  0.8110899049291647  0.620639256434014   0.27296848421632497 0.1469783913269253  0.09839966259964415 0.09114140372017578 0.08466504683707554
A0AVT1  Ubiquitin-like modifier-activating enzyme 6 OS=Homo sapiens OX=9606 GN=UBA6 PE=1 SV=1   untreated   0.9999999999999999  0.958583600124457   0.9186015039073129  0.7931103298371501  0.6371543158050909  0.2961612776361143  0.1906517234355597  0.1262756628460495  0.08644015832545016 0.10189839301420821
A0JP26  POTE ankyrin domain family member B3 OS=Homo sapiens OX=9606 GN=POTEB3 PE=2 SV=2    drug    1   1.0329709366665525  0.992608905953596   0.856823167325711   0.5637112407358972  0.33733248730437276 0.22063176400267687 0.12369974765881786 0.09208341140527972 0.07783909884307255
A0MZ66  Shootin-1 OS=Homo sapiens OX=9606 GN=SHTN1 PE=1 SV=4    drug    1   0.942182620961541   0.8933816163905693  0.7777887922731646  0.4365328174220588  0.2411458853957169  0.14671320189828937 0.10003370076790084 0.08311532684604751 0.07053079910942059
A0MZ66  Shootin-1 OS=Homo sapiens OX=9606 GN=SHTN1 PE=1 SV=4    untreated   1   1.0507118796382515  0.9511041343416113  0.7614709836519069  0.41045648542541546 0.2548212225189782  0.1568404913423995  0.11554599778975412 0.0957489674403767  0.07388379958579645
A1L168  Uncharacterized protein C20orf202 OS=Homo sapiens OX=9606 GN=C20orf202 PE=2 SV=2    drug    1   1.0177762016484024  1.1452648937926244  1.0292107635136092  1.1021902026825343  0.8755071127032578  0.8727110149449968  0.678405675478259   0.4470035933443866  0.2227701755434357
A1L168  Uncharacterized protein C20orf202 OS=Homo sapiens OX=9606 GN=C20orf202 PE=2 SV=2    untreated   0.9999999999999999  0.8869879800666665  0.8571212043897039  0.799132206840635   0.6611639502929558  0.45436526806634636 0.41151924668975715 0.4222579162768429  0.21587998836109556 0.14049385962865726
A1L170-2    Isoform 2 of Uncharacterized protein C1orf226 OS=Homo sapiens OX=9606 GN=C1orf226   drug    1   1.1771502066945339  1.1554256804631469  0.9831531089833269  0.8817312939741198  0.8801027980516507  0.5490979375781604  0.480002886750302   0.3081868284119481  0.1734711402541223
A1L188  NADH dehydrogenase [ubiquinone] 1 alpha subcomplex assembly factor 8 OS=Homo sapiens OX=9606 GN=NDUFAF8 PE=1 SV=1   drug    1   0.9481522077545439  0.9167559838672549  1.1813733404601798  0.9757971570882301  0.8692831284897276  0.46372836218479535 0.41538743231155434 0.5000570501302102  0.3307103636933478
A1L188  NADH dehydrogenase [ubiquinone] 1 alpha subcomplex assembly factor 8 OS=Homo sapiens OX=9606 GN=NDUFAF8 PE=1 SV=1   untreated   0.9999999999999999  0.7567227226469959  1.1546100072878485  1.1228998624576403  0.8106628052955032  0.7205380566818719  0.6329344935500069  0.6296167144485068  0.5788430457154651  0.3743084581347437
A1L443  NUT family member 2F OS=Homo sapiens OX=9606 GN=NUTM2F PE=2 SV=2    untreated   1   0.9225264366435793  0.7259445225641892  0.8432502973463032  0.9244266118409786  0.7048844805190569  0.515751537377487   0.528851434803388   0.6057279971957953  0.3866243515228186
A1L4Q6  Putative uncharacterized protein FLJ41423 OS=Homo sapiens OX=9606 PE=5 SV=1 untreated   1   1.138067698480128   1.0672924356082634  1.1019660785262646  0.8802541380877005  0.7067984040013714  0.6203403582082649  0.47540557870614286 0.40258331345237003 0.472781485779022
A1X283  SH3 and PX domain-containing protein 2B OS=Homo sapiens OX=9606 GN=SH3PXD2B PE=1 SV=3   drug    1   0.9375327651356732  0.9547627667705975  0.8325692403349169  0.3875712143907674  0.20646883295604557 0.16169523207872427 0.10118464452798918 0.07664939373563846 0.06558003087170021
A1X283  SH3 and PX domain-containing protein 2B OS=Homo sapiens OX=9606 GN=SH3PXD2B PE=1 SV=3   untreated   1   1.017888633242132   0.9214756809155263  0.7959384639363979  0.4659202192690397  0.24537661413900397 0.1895702146296479  0.1451987384557108  0.1193681947622023  0.09446883627284913

So I have different proteins and then 10 valeus for different temperatures My x will be the Temps 37, 40 and so on and then Y I will have my values. I was loking for a way to do it with ggplot but I couldnt find something similar is always just one X and Y. I need to tranpose my data to be able to do it? Also I wanted something like this pic, where I can have a mean of each of the treatments for all proteins together and a shade with all the distribution, other thing I need is a way to plot individual ids or show individual ids in the middle of all the others. How is the best way to do it, any tips where I should look?

Your example is rally not easy to work with. Maybe you should consider to display in a different manner in order people can copy/paste it in their R session. See this link for some advices: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — dc37, Apr 21 '20 at 20:53
Or perhaps just give us *fake data* that mimics the structure you have here. Most questions can often start with `set.seed(42); dat <- data.frame(x=runif(...), y=rbeta(...), z=sample(letters,...))` and still help you figure out how to do what you need. — r2evans, Apr 21 '20 at 20:57

score 0 · Answer 1 · answered Apr 22 '20 at 19:36

As others have mentioned, it's difficult to help you 100% without a sample dataset, but I believe from your question you're looking for suggestions on a place to start. I know you posted your data, but it's not in a form that's easily imported. Suggestion in the future is to post the output of dput(your.data.frame).

Your Dataset

I prefer to work with your dataset, but it's not so easy to work in the form in which it is posted. I'll recreate something similar here (I would not normally organize it like this, but I'm trying to show using an example dataset similar to your own):

# creating the datapoints with "temp" as the header.  transposing a matrix to get there...
ex.data <- as.data.frame(
    t( apply(1:8, function(i) {log(1:10)*i*0.1}) )
)

# adding "names" (just numbers), and finishing the dataframe
names(ex.data) <- seq(37,64, by=3)
rawdata <- data.frame(id=LETTERS[1:8], ex.data)

Giving us this:

  id X37        X40       X43       X46       X49       X52       X55       X58       X61       X64
1  A   0 0.06931472 0.1098612 0.1386294 0.1609438 0.1791759 0.1945910 0.2079442 0.2197225 0.2302585
2  B   0 0.13862944 0.2197225 0.2772589 0.3218876 0.3583519 0.3891820 0.4158883 0.4394449 0.4605170
3  C   0 0.20794415 0.3295837 0.4158883 0.4828314 0.5375278 0.5837730 0.6238325 0.6591674 0.6907755
4  D   0 0.27725887 0.4394449 0.5545177 0.6437752 0.7167038 0.7783641 0.8317766 0.8788898 0.9210340
5  E   0 0.34657359 0.5493061 0.6931472 0.8047190 0.8958797 0.9729551 1.0397208 1.0986123 1.1512925
6  F   0 0.41588831 0.6591674 0.8317766 0.9656627 1.0750557 1.1675461 1.2476649 1.3183347 1.3815511

First of all, you will want to change your data around to make it easier to work with ggplot. ggplot likes to deal with what is referred to as "Tidy Data". That link is a great resource for further information, and you can also read about it in this article.

Oh, and see those "X"s in front of the temperatures? we'll fix that later.

How to make your data "Tidy"

How do you convert your data into "tidy data"? Well, consider if we were to plot every point of your data on a scatter plot, we only need 3 variables to "explain" the data:

id - the name of the strain
temp - the temperature of the reading
value - the value of the reading at that temperature

With that in mind, we should convert your dataset of 11 columns into 3. We already have an "id" column, so that's not a problem. The other 10 columns contain the results of "temp" and "value" mixed together. We can "gather" all that information together using the gather function from tidyr. Since "id" is not involved, we'll make sure to exclude that column from the gather function:

df <- rawdata %>% gather(key='temp', value='value', -id)

This gives you a "long" dataset that's easy to plot with ggplot2. One remaining issue is that the temperature is not a number, and they are all characters that are "X37" when you want the number 37. It's an easy fix, but I wanted to leave it in because I've encountered dealing with this very often when cleaning up data. The approach here is to separate the column by removing the left character (the "X"), then converting the column to numeric:

df <- df %>%
    separate(col=temp, into=c('toss', 'temp'), sep=1) %>%
    select(-toss)  # get rid of this garbage
df$temp <- as.numeric(df$temp)

Now you're ready to plot.

Making a plot

The approach to plotting with ggplot2 is consider:

What are my axes? - here we're plotting temperature on the x and "value" on the y.
How do I represent the data? - here I want to show lines, but making sure to group them by the "id".

Take it stepwise, and first just generate a combined plot, separating out by "id". At this point, color isn't even included:

ggplot(df, aes(x=temp, y=value, group=id)) +
    geom_line() + theme_bw()

Ugly, but functional. group= is there to make sure the lines are drawn separated by "id". If you want to color each line separately, you can use aes(color=id) in the geom_line call. If you want to approach something like what you've shown, you can created those faceted plots by using facet_wrap and gghighlight (from the gghighlightpackage):

ggplot(df, aes(x=temp, y=value, group=id)) +
    geom_line(color='red') +
    theme_bw() +
    gghighlight(use_direct_label = F) + facet_wrap(~ id, nrow=2)

Visuals can be adjusted to your liking, but that's the general approach I would take, and should showcase how I would approach your question from raw data to final plot.

To combine the faceted plot above with another plot (like you see in the picture) is possible using ggarrange, but I won't go into that here (need to update R version etc etc...). The documentation should help you out and there are various examples of that sort of thing here on SO.

Hej, Thank you very much, it worked smoothly. Thats what I was wandreing about tidy, I thought that I had to change my data to get this done but have no idea where to start, thanks for the links and clarification, it helped a lot. One just quick question, if I want just one plot with everything in gray and lets say, 3 selected ids as red? which is the best way to do it? Like I dont want many plots with every single id highlighted. — Ander Daniel, Apr 23 '20 at 16:23
Two ways that would work: setup `ggplot(...aes(x,y,group))`, then (1) add `geom_line(color='gray')` and then add another `geom_line`, but specify `data=subset(df, id==''name1" | id =="name2" | ...)`, and `color="red"`. Or (2) You could also do this in one `geom_line` call, with `aes(color=highlight)`, which means add a column in your dataset (`df$highlight`) with "yes" and "no", then specify colors with `scale_color_manual(values=list('yes' = 'red', 'no'='gray')`' — chemdork123, Apr 23 '20 at 16:35

How to make XY line plots with ggplot

1 Answers1

Your Dataset

How to make your data "Tidy"

Making a plot