0

My function run_deinterleave() is meant to copy code from the file deinterleave.sh then replace the placeholder (sra_data) with a file name which has been input by the user and then run it on the command line.

def  run_deinterleave():
   codes = open('Project/CODE/deinterleave.sh')     
   codex = codes.read()
   print(inp_address)
   codex = codex.replace('sra_data', inp_address)
   #is opening this twice creating another pipeline?
   stream = os.popen(codex)
   codes.close()
   
   self.txtarea.insert(END,codex)    
#stuff

However, I keep getting this error:

/bin/sh: 5: Syntax error: "(" unexpected

The code in deinterleave.sh works fine and produces two individual files given an interleaved paired end sra_file (an output file from genetic sequencing machines, I think :P)

#1deinterleave paired end fastq file 
 paste - - - - - - - - < sra_data \
| tee >(cut -f 1-4 | tr "\t" "\n" > /home/lols/Project/reads-1.fq) \
| cut -f 5-8 | tr "\t" "\n" > /home/lols/Project/reads-2.fq
tripleee
  • 175,061
  • 34
  • 275
  • 318
Lauttred
  • 1
  • 1

2 Answers2

0

As the error message shows, the code was interpreted by /bin/sh; if you executed
/bin/sh Project/CODE/deinterleave.sh, you'd get the same error, because the process substitution >(…) is a Bash extension not understood by /bin/sh.
Besides, since you don't communicate with the shell code, we don't need pipes at all. So instead of os.popen I'd use subprocess.run, which allows to specify Bash as the shell.

   subprocess.run(codex, shell=True, executable="bash")
Armali
  • 18,255
  • 14
  • 57
  • 171
0

The absolutely best fix is probably to replace the shell script with native Python code; but without a specification and/or sample input, I don't think we can tell you exactly how to do that.

An immediate and trivial fix is to change deinterlace so that it accepts an input file parameter.

#!/usr/bin/env bash
paste - - - - - - - - < "${1-sra_data}" |
tee >(cut -f 1-4 | tr "\t" "\n" > "${2-/home/lols/Project/reads-1.fq}") |
cut -f 5-8 | tr "\t" "\n" > "${3-/home/lols/Project/reads-2.fq}"

This refactoring also allows you to specify the names of the output files as the second and third command-line arguments.

Also, a Bash script really should not have a .sh extension, so probably take that out.

Explictly naming Bash in the shebang line should solve the error message you got when running Bash code in sh; perhaps see also Difference between sh and bash

With that, your Python code can be reduced to something like

subprocess.run(
    ['Project/CODE/deinterleave', inp_address],
    # probably a good idea
    check=True)

though I don't exactly understand the rest of the surrounding function, so it's not clear how exactly to rewrite it.

I think the shell script could be reimplemented something like

with open(inp_address, 'r') as sra_data, open(
    '/home/lols/Project/reads-1.fq', 'w') as first, open(
        '/home/lols/Project/reads-2.fq', 'w') as second:
    for idx in range(4):
        first.write(sra_data.readline())
    for idx in range(4):
        second.write(sra_data.readline())
tripleee
  • 175,061
  • 34
  • 275
  • 318