2
arg2 = f'cat <(grep \'#\' temp2.vcf) <(sort <(grep -v \'#\' temp2.vcf) | sortBed -i - | uniq ) > out.vcf'
print(arg2)
try:
    subprocess.call(arg2,shell=True)
except Exception as error:
    print(f'{error}')

While I'm running this I get the following error:

/bin/sh: -c: line 0: syntax error near unexpected token `('   
/bin/sh: -c: line 0: `cat <(grep '#' temp2.vcf) <(sort <(grep -v '#' temp2.vcf) | sortBed -i - | uniq ) > Out.vcf'

but when I run in the command line it works.

tripleee
  • 175,061
  • 34
  • 275
  • 318
AA D
  • 35
  • 5
  • Copy this into shellcheck.net and it will spit out the issues. You’ll see that your escaping of the single quote isn’t adequate. – JNevill Mar 14 '22 at 18:13
  • It often helps to simplify things. Extract a [mcve], in particular. Also, please take the [tour] and read [ask]. – Ulrich Eckhardt Mar 14 '22 at 18:15

2 Answers2

2

Python's call() function invokes the command with sh by default. The process substitution syntax is supported by bash, but not by sh.

$ sh -c "cat <(date)"  
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `cat <(date)'

$ bash -c "cat <(date)"
Mon Mar 14 11:12:48 PDT 2022

If you really need to use the bash-specific syntax, you should be able to specify the shell executable (but I have not tried this):

subprocess.call(arg2, shell=True, executable='/bin/bash')
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • I don't know about redundant; the purpose of the first `cat` seems to be to collect the `#` lines followed by the output from the pipe. – tripleee Mar 15 '22 at 07:32
  • yes @tripleee is right. It captures the # and then later excludes it before sorting. But executable='/bin/bash' worked. Thank you so much – AA D Mar 15 '22 at 10:25
  • Thank you for the clarification. I have removed the part of my answer that was not relevant to the solution. – Bill Karwin Mar 15 '22 at 14:15
2

The immediate error is that your attempt uses Bash-specific syntax. You can work around that with an executable="/bin/bash" keyword argument; but really, why are you using a complex external pipeline here at all? Python can do all these things except sortBed natively.

with open("temp2.vcf", "r"
      ) as vcfin, open("out.vcf", "w") as vcfout:
    sub = subprocess.Popen(
        ["sortBed", "-i", "-"],
        text=True,
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE)

    for line in vcfin:
        if "#" in line:
            vcfout.write(line)
        else:
            sub.stdin.write(line)

    subout, suberr = sub.communicate()

    if suberr is not None:
        sys.stderr.write(suberr)

    seen = set()
    for line in subout.split("\n"):
        if line not in seen:
            vcfout.write(line + "\n")
        seen.add(line)

The Python reimplementation is slightly clunkier (and untested, as I don't have sortBed or your input data) but that also means it's more obvious where to change something if you want to modify it.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I would probably still use `uniq` in place of a pure Python reimplementation, something like `subprocess.run(['uniq'], stdin=sub.stdout, stdout=vcfout)` – chepner Mar 14 '22 at 22:12
  • @chepner In that case, perhaps create a single `subprocess` with a pipeline and `shell=True` and simply write into that in the first loop. I guess we can assume that the direct writes to the open file handle will arrive first, though I haven't verified that. – tripleee Mar 15 '22 at 05:30
  • Update: I needed to add an explicit `vcfout.flush()` for that case, to force the stuff written to the pipe before the subprocess `communicate()` to be written out to the file first. – tripleee Mar 15 '22 at 05:44
  • But I require a sorted file to be again sorted using sortbed here. executable="/bin/bash" worked. Thank you so much. – AA D Mar 15 '22 at 10:23