1

I'm testing on my python script:

#!/usr/bin/python

import os,sys
import glob
import commands
import gzip
from itertools import islice

f1=gzip.open("/home/xug/scratch/test_trim.fastq","w")
LIST=[]
N = 4
with open("/home/xug/scratch/test.fastq", "r") as f:
    while True:
        line_group = list(islice(f, N))
        if not line_group:
            break
        l3=line_group[3].rstrip()
        l3_trim=commands.getoutput("sed 's/\(.\)B*$/\1/g'" + l3)
        #l3_to = subprocess.Popen(["sed 's/\(.\)B*$/\1/g'",l3],
                                  #stdout=subprocess.PIPE,bufsize=1)
        #l3_trim=l3_to.stdout
        if ( float(len(l3_trim))/float(len(l3)) > 0.70 ):
               LIST.append(line_group[0])
               LIST.append(line_group[1][:int(len(l3_trim))])
               LIST.append(line_group[2])
               LIST.append(l3_trim)

    output=f1.writelines(LIST)

However I got errors like:

sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching ``'
sh: -c: line 1: syntax error: unexpected end of file

Eventually to put an end to the while loop....

LookIntoEast
  • 8,048
  • 18
  • 64
  • 92
  • Start with the most basic question: What does `l3` contain? – Greg Hewgill Nov 30 '11 at 01:46
  • l3, is a variable: l3=list[3] – LookIntoEast Nov 30 '11 at 01:58
  • 1
    I see that in your code (actually, `l3=line_group[3]`). But that still tells us nothing about what the *contents* of that variable are. If you don't know, try the `print` statement. – Greg Hewgill Nov 30 '11 at 01:59
  • Very good question. Please see my new edit.(There's sth. wrong with print l3 content, actually) – LookIntoEast Nov 30 '11 at 02:10
  • There's something wrong with number of elements you get back from islice(). There's no element 3. – favoretti Nov 30 '11 at 02:17
  • Why there's no element 3? There are 4 elements in each list(islice).... – LookIntoEast Nov 30 '11 at 02:23
  • What happens when your loop encounters the end of the file? There's no way out of your `while` loop. – Greg Hewgill Nov 30 '11 at 02:45
  • Yeah, I understand....But another problem is: commands.getoutput("sed 's/\(.\)B*$/\1/g' " + l3) There must be something wrong this this command – LookIntoEast Nov 30 '11 at 02:47
  • It appears that `l3` contains characters such as the backtick, which means something special to the shell. The shell is giving errors because you haven't quoted those characters properly. But why are you using `sed` to process your data anyway? That's going to be *really* slow. Instead, do whatever you need to do using Python's built-in string handling functions. – Greg Hewgill Nov 30 '11 at 03:03
  • yeah, thanks a lot. I use sed, because I used to write bash code for this problem; and now change from bash to python. For l3, there used to be a '\n' at the end; but I've deleted that using rstrip() – LookIntoEast Nov 30 '11 at 03:11

1 Answers1

2

(continuing from the comments above)

To remove trailing B from a string using Python's built-in re module, try:

import re

l3_trim = re.sub(r"B*$", "", l3)
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • oh, thanks so much....but I'm still curious about how to use commands.getoutput+sed – LookIntoEast Nov 30 '11 at 03:07
  • You have to quote the shell metacharacters. See [How to escape os.system() calls in Python?](http://stackoverflow.com/questions/35817/how-to-escape-os-system-calls-in-python) for more information. – Greg Hewgill Nov 30 '11 at 03:10