1

I have a data frame that I give an specific order in a specific column, and use it to generate a plot bar with ggplot2 in R, my problem in when I try to use a different name for the samples in X with scale_x_discret, the labels do not correspond to the real sample names.

library(phyloseq)
sample_data(PO)

         X.SampleID  Primer Final_Barcode         SampleType       ID
CL3             CL3 ILBC_01        AACGCA               Soil      cl3
CC1             CC1 ILBC_02        AACTCG               Soil      cc1
SV1             SV1 ILBC_03        AACTGT               Soil      sv1
M31Fcsw     M31Fcsw ILBC_04        AAGAGA              Feces  m31fcsw
M11Fcsw     M11Fcsw ILBC_05        AAGCTG              Feces  m11fcsw
M31Plmr     M31Plmr ILBC_07        AATCGT               Skin  m31plmr
M11Plmr     M11Plmr ILBC_08        ACACAC               Skin  m11plmr
F21Plmr     F21Plmr ILBC_09        ACACAT               Skin  f21plmr
M31Tong     M31Tong ILBC_10        ACACGA             Tongue  m31tong
M11Tong     M11Tong ILBC_11        ACACGG             Tongue  m11tong
LMEpi24M   LMEpi24M ILBC_13        ACACTG         Freshwater lmepi24m
SLEpi20M   SLEpi20M ILBC_15        ACAGAG         Freshwater slepi20m
AQC1cm       AQC1cm ILBC_16        ACAGCA Freshwater (creek)   aqc1cm
AQC4cm       AQC4cm ILBC_17        ACAGCT Freshwater (creek)   aqc4cm
AQC7cm       AQC7cm ILBC_18        ACAGTG Freshwater (creek)   aqc7cm
NP2             NP2 ILBC_19        ACAGTT              Ocean      np2
NP3             NP3 ILBC_20        ACATCA              Ocean      np3
NP5             NP5 ILBC_21        ACATGA              Ocean      np5
TRRsed1     TRRsed1 ILBC_22        ACATGT Sediment (estuary)  trrsed1
TRRsed2     TRRsed2 ILBC_23        ACATTC Sediment (estuary)  trrsed2
TRRsed3     TRRsed3 ILBC_24        ACCACA Sediment (estuary)  trrsed3
TS28           TS28 ILBC_25        ACCAGA              Feces     ts28
TS29           TS29 ILBC_26        ACCAGC              Feces     ts29
Even1         Even1 ILBC_27        ACCGCA               Mock    even1
Even2         Even2 ILBC_28        ACCTCG               Mock    even2
Even3         Even3 ILBC_29        ACCTGT               Mock    even3

this is part of a structure of phyloseq object (PO), so I change the order of the data frame with SampleType column as follow

sample_data(PO)$SampleType <- factor(sample_data(PO)$SampleType, levels = c("Mock", "Skin", "Ocean", "Soil", "Feces", "Tongue", "Freshwater", "Freshwater (creek)",  "Sediment (estuary)" ))

and then I generate the bar plot with ggplot2 with a function phyloseq::plot_bar that use geom_bar

p <- plot_bar(PO, x="Sample", fill ="Family") + 
    geom_bar(stat="identity") + scale_fill_manual(values = MyPalette2) + theme_bw() +
    ggtitle("15 Most Abundant Families") + 
    theme(plot.title = element_text(size = 13, hjust = 0.5, vjust = 0.5, face = "bold")) + 
    ylab("Relative abundance (%)") +
    xlab("Sample") + 
    guides(fill=guide_legend(ncol=1)) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1))

it generate the next figure (everything ok until here ): enter image description here

So I tried to use scale_x_discrete to get a different names using ID column as names in X.

In this example I just use the names in lower case as example with a different column as name, but it's not the problem for me, I know how to use lower cases with ggplot2, the problem is the order that do not correspond to the samples when I use different column as name in the samples:

p + scale_x_discrete(labels = phyloseq::sample_data(PO)$ID)

enter image description here

the order in the samples is ok, but the name in X do not correspond (red line !!!)

How to generate the plot with the order used here but with the name of a different column ?

Thanks !!!

abraham
  • 661
  • 8
  • 14
  • The issue is most likely that you use a column of your df for the labels. While there are cases where this will work, I would not recommend doing so as it is error-prone and in general does not ensure that you assign the right labels to the breaks. To fix that, why are you not simply mapping your `ID` column on `x`? – stefan Feb 12 '22 at 10:44
  • 1
    If you want great answers quickly, it's best to make your question reproducible. This includes sample data, like that this data is derived from `phyloseq` data `GlobalPatterns` and how it was subset or pruned. The object `MyPalette2` would be a good addition, as well. Check it out: [making R reproducible questions](https://stackoverflow.com/q/5963269). – Kat Feb 12 '22 at 16:17
  • I can’t add the real data because is part of a scientific paper that I going to submit in few days and I don’t want that the magazine say something about it, it is why I used Globalpatern, is exactly the same problem that I have with my data, but that to take you time !!! – abraham Feb 13 '22 at 05:31

0 Answers0