0

I have several Fastq files where I am missing lines of data sporadically throughout. For example here is a correct read with all four lines

@M01698:289:000000000-AVDJ5:1:1101:15411:3896 1:N:0:GTGAATCC+TCCAGGTA

CGCGGCGATGGCGGAGCTGAATTACATTCCCAAC

+

GGGGGGGGGGGGGGGGGHHHHHHHHHHHHHHHHG

and then the very next read is just two lines without the sequence and quality score data

@M01698:289:000000000-AVDJ5:1:1101:19764:3903 1:N:0:GTGAATCC+TCCAGGTA

+

Is there a way of finding these specific, not complete reads and simply adding an empty line above and below to make it a complete read?

    g=open(New file,"w")
    while True:
        ID = f.readline()
        if ID == '':
            break
        seq = f.readline()
        ID2 = f.readline()
        qs = f.readline()
    if seq.contains("+"):
        newseq=seq.replace("/n" "+" "/n")
    else:
        newseq=seq

    g.write(ID)
    g.write(newseq)
    g.write(ID2)
    g.write(qs)
Lauren Cooper
  • 11
  • 1
  • 3
  • 1
    Maybe but It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, tracebacks, etc.). The more detail you provide, the more answers you are likely to receive. Check the [FAQ] and [ask]. – MooingRawr Jan 04 '18 at 21:58
  • What have you tried so far? Please provide some of your code so we can help edit it. – serk Jan 04 '18 at 22:11
  • What in your code doesn't work as you expected? – chickity china chinese chicken Jan 05 '18 at 00:06
  • I only get the first two lines of my original fastq as output. Now that I think about it more, this won't give me what I need because I would need those instances to also have altered qs. So I think I need something like... if ID.contains("@M") and seq=='+' and ID2.contains("@M"): newseq=("/n") newID2=('+') newqs=('/n') – Lauren Cooper Jan 05 '18 at 00:07
  • But I get an error for the first line...if ID.contains("@M") and seq=='+' and ID2.contains("@M"): AttributeError: 'str' object has no attribute 'contains' – Lauren Cooper Jan 05 '18 at 00:15
  • try using [`in`](https://stackoverflow.com/questions/19775692/use-and-meaning-of-in-in-an-if-statement), `if '@M' in ID` – chickity china chinese chicken Jan 05 '18 at 00:16

0 Answers0