I have a data frame that I give an specific order in a specific column, and use it to generate a plot bar with ggplot2 in R, my problem in when I try to use a different name for the samples in X with scale_x_discret, the labels do not correspond to the real sample names.
library(phyloseq)
sample_data(PO)
X.SampleID Primer Final_Barcode SampleType ID
CL3 CL3 ILBC_01 AACGCA Soil cl3
CC1 CC1 ILBC_02 AACTCG Soil cc1
SV1 SV1 ILBC_03 AACTGT Soil sv1
M31Fcsw M31Fcsw ILBC_04 AAGAGA Feces m31fcsw
M11Fcsw M11Fcsw ILBC_05 AAGCTG Feces m11fcsw
M31Plmr M31Plmr ILBC_07 AATCGT Skin m31plmr
M11Plmr M11Plmr ILBC_08 ACACAC Skin m11plmr
F21Plmr F21Plmr ILBC_09 ACACAT Skin f21plmr
M31Tong M31Tong ILBC_10 ACACGA Tongue m31tong
M11Tong M11Tong ILBC_11 ACACGG Tongue m11tong
LMEpi24M LMEpi24M ILBC_13 ACACTG Freshwater lmepi24m
SLEpi20M SLEpi20M ILBC_15 ACAGAG Freshwater slepi20m
AQC1cm AQC1cm ILBC_16 ACAGCA Freshwater (creek) aqc1cm
AQC4cm AQC4cm ILBC_17 ACAGCT Freshwater (creek) aqc4cm
AQC7cm AQC7cm ILBC_18 ACAGTG Freshwater (creek) aqc7cm
NP2 NP2 ILBC_19 ACAGTT Ocean np2
NP3 NP3 ILBC_20 ACATCA Ocean np3
NP5 NP5 ILBC_21 ACATGA Ocean np5
TRRsed1 TRRsed1 ILBC_22 ACATGT Sediment (estuary) trrsed1
TRRsed2 TRRsed2 ILBC_23 ACATTC Sediment (estuary) trrsed2
TRRsed3 TRRsed3 ILBC_24 ACCACA Sediment (estuary) trrsed3
TS28 TS28 ILBC_25 ACCAGA Feces ts28
TS29 TS29 ILBC_26 ACCAGC Feces ts29
Even1 Even1 ILBC_27 ACCGCA Mock even1
Even2 Even2 ILBC_28 ACCTCG Mock even2
Even3 Even3 ILBC_29 ACCTGT Mock even3
this is part of a structure of phyloseq object (PO), so I change the order of the data frame with SampleType column as follow
sample_data(PO)$SampleType <- factor(sample_data(PO)$SampleType, levels = c("Mock", "Skin", "Ocean", "Soil", "Feces", "Tongue", "Freshwater", "Freshwater (creek)", "Sediment (estuary)" ))
and then I generate the bar plot with ggplot2 with a function phyloseq::plot_bar that use geom_bar
p <- plot_bar(PO, x="Sample", fill ="Family") +
geom_bar(stat="identity") + scale_fill_manual(values = MyPalette2) + theme_bw() +
ggtitle("15 Most Abundant Families") +
theme(plot.title = element_text(size = 13, hjust = 0.5, vjust = 0.5, face = "bold")) +
ylab("Relative abundance (%)") +
xlab("Sample") +
guides(fill=guide_legend(ncol=1)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1))
it generate the next figure (everything ok until here ):
So I tried to use scale_x_discrete to get a different names using ID column as names in X.
In this example I just use the names in lower case as example with a different column as name, but it's not the problem for me, I know how to use lower cases with ggplot2, the problem is the order that do not correspond to the samples when I use different column as name in the samples:
p + scale_x_discrete(labels = phyloseq::sample_data(PO)$ID)
the order in the samples is ok, but the name in X do not correspond (red line !!!)
How to generate the plot with the order used here but with the name of a different column ?
Thanks !!!