0

I have 48 sample files and some have less than ten reads, I would like to filter these samples from my list of files. I can only find information on filtering reads from files yet I need to filter files based on the number of reads contained in them. I cannot do this by file size, Is there a way to do it simple by the number of reads in the files?

  • What's a "read" in this context? Could you show what sample files would look like? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Andrea M Jul 15 '22 at 19:47
  • Is this real sequencing data or is this a hypothetical question? A real fastq would typically contain hundreds of thousands to millions of reads. If you have real world fastqs with less than ten reads then something has likely already gone very wrong with your experiment. How many reads are in the good fastqs? Do you know what platform was used for the sequencing? – EAW Jul 18 '22 at 10:05
  • It is real sequencing data yet from a past experiment, as a trial run for myself in my new job. There were 48 samples with varying read counts, most at least 30,000 but 9 of them had reads in the tens. I found a way to filter them out though: # Delete files with less than 13 Reads for (x in R1s) { if ((countLines(x)/4) < 13) { unlink(x) } } for (x in R2s) { if ((countLines(x)/4) < 13) { unlink(x) } } exists <- file.exists(R1s) & file.exists(R2s) R1s <- R1s[exists] R2s <- R2s[exists] – Sara Nicholson Jul 19 '22 at 13:53

0 Answers0