0

I'm using ggplot to plot the relative abundance of microbiome data. my problem is that the y axis is shown more than 100 (i.e. 150 or 80)

here is the code I used to get relative abundance: #Limiting data to top 100 taxa based on abundance then using transform_sample_counts.

top100 <-names(sort(taxa_sums(ps_oral), decreasing=TRUE)) [1:100]
ps_oral_top100 <- prune_taxa(top100, ps_oral)
ntaxa(ps_oral_top100)
ps_oral_ra = transform_sample_counts(ps_oral_top100, function(x){x / sum(x)})

Here is the code I used to plot:

ggplot(data = psmelt(ps_oral_ra), mapping = aes(x= GroupDay, y= Abundance, color= Phylum, fill= Phylum )) + geom_col()+  labs(x = "", y = "Relative Abundance\n")+theme_classic() # Relative abundance using the ps_oral_ra. At Phylum level. 

ggplot(data = psmelt(ps_oral_ra), mapping = aes(x= GroupDay, y= Abundance, color= Genus, fill= Genus )) + geom_col() +theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+  labs(x = "", y = "Relative Abundance\n") # Relative abundance using the ps_oral_ra. At Genus level. 

Another question is how can I limit Genus to the top 20 or 10 only?

Thank you for the help.

for a reproducible example: I used dput(sample_data(ps_oral)) [1:10, 1:6],

Sample Data: [10 samples by 6 sample variables]: Lane Salivette MasterID Day Group GroupDay X1.30031_S31 L2 1 30031 Pre Probiotic Baseline X10.3012_S167 L2 10 3012 Pre Probiotic Baseline X100.3109_S46 L2 100 3109 Pre Placebo Baseline X101.3110_S43 L1 101 3110 Pre Probiotic Baseline X102.3111_S64 L1 102 3111 Pre Placebo Baseline X103.3112_S119 L2 103 3112 Pre Placebo Baseline X104.3114_S115 L2 104 3114 Pre Probiotic Baseline X105.3115_S119 L1 105 3115 Pre Probiotic Baseline X106.3116_S143 L2 106 3116 Pre Placebo Baseline X107.3117_S184 L2 107 3117 Pre Placebo Baseline

1 Answers1

0

It's a bit tricky to fix without having sample data, but you can probably add:

ggplot [...] + [...] +
  scale_y_continuous(limits = c(0, 100))
ktiu
  • 2,606
  • 6
  • 20
  • Thank you for the fast response, when I add that I get: Warning message: Removed 10923 rows containing missing values (geom_col). and the bars don’t all reach 100 to represent relative abundance. For sample data, I have three groups in the "GroupDay". Thanks – Mashael Aljumaah Jun 22 '21 at 15:48
  • It would be helpful if you could provide us with a reproducible [minimal working example](https://en.wikipedia.org/wiki/Minimal_working_example) that we can copy and paste to better understand the issue and test possible solutions. You can share datasets with `dput(YOUR_DATASET)` or smaller samples with `dput(head(YOUR_DATASET))`. (See [this answer](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#5963610) for some great advice.) – ktiu Jun 22 '21 at 15:53
  • I added a reproducible example, thank you – Mashael Aljumaah Jun 22 '21 at 17:40