0

I have a perl script that can be executed from the console as follows:

perl perlscript.pl -i input.txt -o output.txt --append

I want to execute this script from my python code. I figured out that subprocess.Popen can be used to connect to perl and I can pass my arguments with it. But, I also want to pass a variable (made by splitting up a text file) in place of input.txt. I have tried the following but it doesn't seem to work and gives an obvious TypeError in line 8:

import re, shlex, subprocess, StringIO
f=open('fulltext.txt','rb')
text= f.read()
l = re.split('\n\n',str(text))
intxt = StringIO.StringIO()
for i in range(len(l)):
    intxt.write(l[i])
    command_line='perl cnv_ltrfinder2gff.pl -i '+intxt+' -o output.gff --append'
    args=shlex.split(command_line)
    p = subprocess.Popen(args)

Is there any other work around for this?

EDIT: Here is a sample of the file fulltext.txt. Entries are separated by a line.

Predict protein Domains 0.021 second
>Sequence: seq1 Len:13143 [1] seq1 Len:13143 Location : 9 - 13124 Len: 13116 Strand:+ Score    : 6 [LTR region similarity:0.959] Status   : 11110110000 5'-LTR   : 9 - 501 Len: 493 3'-LTR   : 12633 - 13124 Len: 492 5'-TG    : TG , TG 3'-CA    : CA , CA TSR      : NOT FOUND Sharpness: 1,1 Strand + : PBS   : [14/20] 524 - 543 (LysTTT) PPT   : [12/15] 12553 - 12567

Predict protein Domains 0.019 second
>Sequence: seq5 Len:11539 [1] seq5 Len:11539 Location : 7 - 11535 Len: 11529 Strand:+ Score    : 6 [LTR region similarity:0.984] Status   : 11110110000 5'-LTR   : 7 - 506 Len: 500 3'-LTR   : 11036 - 11535 Len: 500 5'-TG    : TG , TG 3'-CA    : CA , CA TSR      : NOT FOUND Sharpness: 1,1 Strand + : PBS   : [15/22] 515 - 536 (LysTTT) PPT   : [11/15] 11020 - 11034

I want to separate them and pass each entry block to the perl script. All the files are in the same directory.

  • Can the Perl script read the input from stdin instead of a file? – choroba Apr 23 '15 at 13:52
  • No it can't. The script I am using is: [link](https://github.com/jestill/dawgpaws/blob/fb0a40506be1ed8afce0049b6cfe3e4b52cd58dc/scripts/cnv_ltrfinder2gff.pl) – Rimjhim Roy Choudhury Apr 23 '15 at 14:04
  • @RimjhimRoyChoudhury: the code says it can accept input from stdin (it even reminds you about it: [*"Expecting input from STDIN"*](https://github.com/jestill/dawgpaws/blob/fb0a40506be1ed8afce0049b6cfe3e4b52cd58dc/scripts/cnv_ltrfinder2gff.pl#L294)). Try to omit `infile` option or pass an empty `''` filename or `'-'` or `/dev/stdin` in bash. – jfs Apr 25 '15 at 23:11
  • On Unix, you could [use named pipes or `/dev/fd/N` filenames](http://stackoverflow.com/a/28840955/4279) to avoid writing the input data on disk if a child process reads only from files. – jfs Apr 25 '15 at 23:18

2 Answers2

1

you might be interested in the os module and string formatting

Edit

I think I uderstand what you want now. correct me if I am wrong, but I think:

  • You want to split your fulltext.txt into blocks.
  • Every block contains a seq(number)
  • You want to run your perl script once for every block with as input file your seq(number)

if this is what you want, you could use the following code.

import os

in_file = 'fulltext.txt'
seq = []

with open(in_file,'r') as handle:
    lines = handle.readlines()
    for i in range(0,len(lines)):
        if lines[i].startswith(">"):
            seq.append(lines[i].rstrip().split(" ")[1])

for x in seq:
    command = "perl perl cnv_ltrfinder2gff.pl -i %s.txt -o output.txt --append"%x
    os.system(command)
zazga
  • 366
  • 2
  • 3
  • 12
0

The docs for --infile option:

Path of the input file. If an input file is not provided, the program will expect input from STDIN.

You could omit --infile and pass input via a pipe (stdin) instead:

#!/usr/bin/env python
from subprocess import Popen, PIPE

with open('fulltext.txt') as file: # read input data
    blocks = file.read().split('\n\n')

# run a separate perl process for each block
args = 'perl cnv_ltrfinder2gff.pl -o output.gff --append'.split()
for block in blocks:
    p = Popen(args, stdin=PIPE, universal_newlines=True)
    p.communicate(block)
    if p.returncode != 0:
        print('non-zero exit status: %s on block: %r' % (p.returncode, block))

You can run several perl scripts concurrently:

from multiprocessing.dummy import Pool # use threads

def run((i, block)):
    filename = 'out%03d.gff' % i
    args = ['perl', 'cnv_ltrfinder2gff.pl', '-o', filename]
    p = Popen(args, stdin=PIPE, universal_newlines=True, close_fds=True)
    p.communicate(block)
    return p.returncode, filename

exit_statuses, filenames = zip(*Pool().map(run, enumerate(blocks, start=1)))

It runs several (equal to the number of CPUs on your system) child processes in parallel. You could specify a different number of worker threads (pass to Pool()).

jfs
  • 399,953
  • 195
  • 994
  • 1,670