0

I am trying to plot boxplot for shannon diversity index based on two columns (SampleType and Compartment) but I am getting below error although Compartment column present in metadata file? I will be thankful for your time and help.

Could you please suggest how I can plot it with ggplot2?

design <-read.delim("metadata.txt", sep="\t", header=TRUE)

                  X Treatment Compartment   Villages   Region Season
1 Root-10.S75.L001        T1        Root     Matura Rajshahi    Dry
2       Root.11.S5        T1        Root   Gullapur Rajshahi    Dry
3      Root.12.S16        T1        Root  Maidaipur Rajshahi    Dry
4  Root-13.S5.L001        T1        Root      Gokul Rajshahi    Dry
5 Root-14.S16.L001        T1        Root       Jiol Rajshahi    Dry
6      Root.15.S26        T1        Root Samastipur Rajshahi    Dry
    SampleType
1 Rajshahi.Dry
2 Rajshahi.Dry
3 Rajshahi.Dry
4 Rajshahi.Dry
5 Rajshahi.Dry
6 Rajshahi.Dry


rownames(design) <-design[,1]
dim(design)
design
shannon <- read.delim("16s.shannon.txt",sep = "\t", row.names=1, header=T, blank.lines.skip = FALSE)
colnames(shannon)
dim(shannon)
shannon_info <- cbind(design,shannon)
design <-read.delim("metadata.txt", sep="\t", header=TRUE)
rownames(desgin) <-design[,1]
dim(design)
desgin
shannon <- read.delim("16s.shannon.txt",sep = "\t", row.names=1, header=T, blank.lines.skip = FALSE)
> head (shannon)
                               Shannon
PN0086A.Exp2.Root.1.S28.L001  3.078570
PN0086A.Exp2.Root.10.S32.L001 4.958543
PN0086A.Exp2.Root.13.S33.L001 5.157430
PN0086A.Exp2.Root.14.S34.L001 4.763404
PN0086A.Exp2.Root.17.S35.L001 4.418245
PN0086A.Exp2.Root.18.S36.L001 5.425252

colnames(shannon)
dim(shannon)
shannon_info <- cbind(design,shannon)
shannon_info
shannon_info$SampleType <-ordered(shannon_info$SampleType, levels=c("Mymensingh.Dry", "Mymensingh.Wet", "Rajshahi.Dry", "Rajshahi.Wet"))
#boxplot
pdf("Fig2b.16S.shannon.pdf",width=12,height=6)
with(shannon_info, boxplot(shannon ~ Compartment, xlab="Samples", ylab="Shannon Index"))

Error

Error in stats::model.frame.default(formula = shannon ~ Compartment) : invalid type (list) for variable 'shannon'

Many thank Bioinfonext

Parfait
  • 104,375
  • 17
  • 94
  • 125
bioinfonext
  • 119
  • 7
  • You are using lowercase `shannon` instead of uppercase `Shannon` as first colname in `shannon_info` matrix. So R looks to original data frame assigned earlier. Consider a more distinct naming of objects than different cases for code maintainability and readability. – Parfait Jun 30 '20 at 15:48
  • Thanks for your quick help. Could you please also suggest how I can plot this figure based on fecet_wrap code for SampleType so that for each Sampletype there are two Compartment: soil and root can be plotted separately. I am trying this but it is giving same figure" with(shannon_info, boxplot(Shannon ~ Compartment, xlab="Samples", ylab="Shannon Index"), + facet_grid(~SampleType,scales="free",space="free")) " – bioinfonext Jun 30 '20 at 16:36
  • You are mixing APIs. Base graphics are not same as ggplot2 graphics. But see below edited solution for a base graphics. – Parfait Jun 30 '20 at 16:57

1 Answers1

0

As commented, you have a typographic issue of shannon vs Shannon and since R is case sensitive these two are different object references. In addition to fixing the proper object name, consider also converting your data into a data frame instead of matrix to use data argument of boxplot to define the scope where formula variables derive. Per docs:

data     a data.frame (or list) from which the variables in formula should be taken.

shannon_info <- cbind.data.frame(design, shannon)
shannon_info$SampleType <- ordered(shannon_info$SampleType, levels=c("Mymensingh.Dry", "Mymensingh.Wet", "Rajshahi.Dry", "Rajshahi.Wet"))

pdf("Fig2b.16S.shannon.pdf", width=12, height=6)

boxplot(Shannon ~ Compartment, data=shannon_info, 
        xlab="Samples", ylab="Shannon Index"),
        names=levels(shannon_info$SampleType)

enter image description here For multiple plots by a factor type, consider by and par + mfrow for a 4 X 2 subplot output.

par(mfrow=c(4,2))

by(shannon_info, shannon_info$SampleType, function(sub)
   boxplot(Shannon ~ Compartment, data=sub, main=sub$SampleType[[1]],
           xlab="Samples", ylab="Shannon Index"),
           names=levels(sub$SampleType)
)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • thanks for your help. but I am not getting a figure as I am thinking. It should be a single figure in which the compartment name should be on below on x axis and SampleType name should be on top and order of SampleType can be as shown in the above code. All these should have single Shannon index on Y axis. Please help if it can be possible or should it be more easy to plot by ggplot2. – bioinfonext Jun 30 '20 at 19:07
  • Please ask a new question after you research and make an earnest attempt. And clearly show (not tell) your desired result. – Parfait Jun 30 '20 at 19:09
  • Thanks, I have tried this code and getting single plot but naming is not coming as I wanted also not sure if it is correct or not; with(shannon_info, boxplot(Shannon ~ SampleType+Compartment, xlab="Samples", ylab="Shannon Index")) Thanks – bioinfonext Jun 30 '20 at 19:18
  • Naming what? Title? y-axis? x-axis? Also, you are not using the `data` argument approach after `cbind.data.frame` as this answer shows which should work for your formula. And there is no correct or not. If code renders what you need, then it is correct. – Parfait Jun 30 '20 at 19:26
  • x axis title is coming like this: Mymensingh.Dry.Root Mymensingh.Wet.Root and the plot is showing some root samples higher shannon index than soil samples, but I suspect it is not true, ealier figure was correct where soil samples showing higher shannon index than root. sorry for multiple comment but I think I can try this with ggplot but not sure how to start? – bioinfonext Jun 30 '20 at 19:37
  • Can you post enough data (not just 5 rows of same *SampleType*? See [How to make a great R reproducible example](https://stackoverflow.com/q/5963269/1422451) using `dput` of your two data frames: `design` and `shannon`. – Parfait Jun 30 '20 at 19:45
  • thanks for your all help. I will to try to plot it with ggplot2, will post a new question if I try and not able to do it. – bioinfonext Jun 30 '20 at 20:02
  • Use `names` in `barplot`. See edit using your posted data. – Parfait Jun 30 '20 at 20:02