Questions tagged [fastq]

FASTQ files are used in bioinformatics to store sequence information and sequencing quality scores.

FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity.

[Wikipedia]

257 questions

votes

4 answers

faster membership testing in python than set()

I have to check presence of millions of elements (20-30 letters str) in the list containing 10-100k of those elements. Is there faster way of doing that in python than set() ? import sys #load ids ids = set( x.strip() for x in open(idfile) ) for…

asked Aug 18 '11 at 15:47

Leszek

1,290
2
11
21

votes

14 answers

Converting FASTQ to FASTA with SED/AWK

I have a data in that always comes in block of four in the following format (called FASTQ): @SRR018006.2016 GA2:6:1:20:650 length=36 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGN +SRR018006.2016 GA2:6:1:20:650…

awk sed bioinformatics fasta fastq

asked Oct 09 '09 at 07:22

neversaint

60,904
137
310
477

votes

2 answers

How do I use parallel programming/multi threading in my bash script?

This is my script: #!/bin/bash #script to loop through directories to merge fastq files sourcedir=/path/to/source destdir=/path/to/dest for f in $sourcedir/* do fbase=$(basename "$f") echo "Inside $fbase" zcat $f/*R1*.fastq.gz |…

multithreading bash parallel-processing fastq

asked Aug 22 '13 at 15:17

Komal Rathi

4,164
13
60
98

votes

4 answers

bash: /bin/ls: Argument list too long

I need to make a list of a large number of files (40,000 files) like below: ERR001268_1_100.fastq ERR001268_2_156.fastq ERR001753_2_78.fastq ERR001268_1_101.fastq ERR001268_2_157.fastq ERR001753_2_79.fastq ERR001268_1_102.fastq …

list ls fastq

asked Aug 11 '11 at 17:16

LookIntoEast

8,048
18
64
92

votes

3 answers

Read list of files on unix and run command

I am pretty new at shell scripting and I have been struggling all day to figure out how to perform a "for" command. Essentially, what I am trying to do is the following: I have a list.txt file with a bunch of names: name1 name2 name3 for every name…

bash list loops unix fastq

asked Aug 03 '13 at 02:25

user2647734

votes

2 answers

regex: matching several patterns derived from a simple string

I have following task: Starting with 30 character long pattern sequence (it is actually DNA sequence, lest call it P30) I need to find in a text file all lines starting (^agacatacag... )with a exact P30, then with 29 last characters of the 30, 28…

python regex d fastq

asked Dec 12 '14 at 18:31

darked89

votes

4 answers

How can I make my Python script faster?

I'm pretty new to Python, and I have written a (probably very ugly) script that is supposed to randomly select a subset of sequences from a fastq-file. A fastq-file stores information in blocks of four rows each. The first row in each block starts…

python performance bioinformatics fastq

asked Dec 04 '14 at 22:00

Sandra

votes

0 answers

Large files for GitHub CICD

I have a GitHub repo of a pipeline that requires very large files as input (basic test datasets would be around 1-2 Gb). I thought about circunventing this by doing CICD locally, but this will not allow the CICD to run if other people want to…

github continuous-integration continuous-deployment fastq

asked Mar 04 '21 at 11:56

João Sequeira

votes

3 answers

Filter sequences with more than 8 same consecutive nucleotides in a fastq file?

I want to filter my sequences which has more than 8 same consecutive nucleotides like "GGGGGGGG", "CCCCCCCC", etc in my fastq files. How should I do that?

bioinformatics fastq

asked Nov 02 '19 at 23:18

Dawud

votes

4 answers

Map files into memory

I will explain what's my problem first, as It's important to understand what I want :-). I'm working on a python-written pipeline that uses several external tools to perform several genomics data analysis. One of this tools works with very huge…

python memory operating-system fifo fastq

asked Oct 12 '12 at 11:41

guillemch

votes

3 answers

Grep that tolerates mismatches to subset .fastq

I am working with bash on a linux cluster. I am trying to extract reads from a .fastq file if they contain a match to a queried sequence. Below is an example .fastq file containing three reads. $ cat example.fastq @SRR1111111.1…

awk grep bioinformatics fastq sequencing

asked Dec 13 '18 at 18:58

Paul

votes

1 answer

Renaming interleaved fastq headers with biopython

For ease of use and compatibility with another downstream pipeline, I'm attempting to change the names of fastq sequence ids using biopython. For example... going from headers that look like this: @D00602:32:H3LN7BCXX:1:1101:1205:2112…

python replace bioinformatics biopython fastq

asked Oct 10 '18 at 08:22

Gunther

votes

0 answers

Error sh: 1: fastqc: not found while calling fastqc

I have checked many times that fastqc is installed in bin folder and library("fastqcr") is also not giving any error still I am getting error of sh: 1: fastqc: not found for the following command fastqc(fq.dir = "~/WES_Pipeline/Data", #…

r bioinformatics fastq

asked Jan 07 '18 at 11:01

Lot_to_learn

votes

3 answers

How can I do a transparent gzip uncompress from both stdin and files in perl?

I've written a few scripts for processing FASTA/FASTQ files (e.g. fastx-length.pl), but would like to make them more generic and accept both compressed and uncompressed files as both command line parameters and as standard input (so that the scripts…

fasta fastq compression perl

asked Jun 10 '17 at 23:14

gringer

votes

2 answers

Efficient way to TRANSLATE every Nth string in bash or R

Thank you for taking the time to look at this. I have a fastq file and I want to translate it to the complementary, but not the reverse complementary, something like this: @Some header example:1: ACTGAGACTCGATCA + S0m3_Qu4l1t13s& Translated…

r bash awk fastq

asked Apr 08 '15 at 21:22

Edahi

2 3

…

17 18 Next