1

I'm trying to reproduce the dendogram results of this paper, concerning to an specific 16s rRNA analysis.

But I don't know if there is a standard method for data management or data analysis. So, I've trying by myself. Below, a summary.

In the methods section says: "The resulting FASTQ files were deposited at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA386442. MiSeq paired-end raw sequence forward and reverse reads were subsequently merged using ea-utils v1.1.2 with standard settings, followed by a split library step from QIIME v1.9.1 and removal sequence reads shorter than 200 nucleotides, reads that contained ambiguous bases, or reads with an average quality score of less than 30. "

So, I downloaded the sra files using SRATOOLKIT and used this code at the terminal:

for n in {141..188}; do prefetch "SRR5577$n"; done

Later, I converted to fastq files using:

for n in {141..188}; do fastq-dump "SRR5577$n"; done

But, for the merge step I can't use the fastq-join function or any other in the ea-utils package on github. It seems data doesn't have a correct format.

Did I do it well? Where can I learn more about this kind of analysis?

zx8754
  • 52,746
  • 12
  • 114
  • 209

1 Answers1

2

I would suggest using --split-files in fastq-dump, e.g.:

for n in {141..188}; do fastq-dump --split-files "SRR5577$n"; done

As it appears that the data are paired-end. otherwise you wouldn't need to merge them. It will give you separate forward and reverse read files which presumably you input to ea-utils.

Maximilian Press
  • 300
  • 4
  • 12