1

I'm trying to execute an awk command to process some text files inside a python script. The following line will print the last 2 columns of the input file and sort by the second col. This command works: subprocess.call(["awk",'{print $NF,$(NF-1) | "sort -k 2 -n" }', file2],stdout=f3).

Now I would like to cut remove the NF col from the sorted file. I added the following line and it gives me syntax error on "pipe" subprocess.call(["awk",'{print $NF,$(NF-1) | "sort -k 2 -n" | '$NF="";print'}', file2],stdout=f3)

what am I missing in my syntax?

Anu
  • 101
  • 2
  • 8
  • Are you _really_ attached to having `awk` be responsible for starting `sort`? (If so, do you have a regular shell command that behaves the way you want, so we can tease apart the Python-specific parts of the problem from the general shell+awk ones?) – Charles Duffy Dec 29 '20 at 20:01
  • ...mind, one could also pretty easily have awk do the sorting internally instead of having it start the separate `sort` executable at all. – Charles Duffy Dec 29 '20 at 20:02
  • But then, one could also pretty easily do the whole thing in-process in Python and not need `awk` _or_ `sort` at all. – Charles Duffy Dec 29 '20 at 20:03
  • @tripleee, I'm not sure this is a duplicate of the particular linked original. The pre-modification working code example uses a pipe in a context where it's an awk construct (albeit an awk construct that starts a shell, and one which it's usually a really bad idea to use) rather than a shell construct. – Charles Duffy Dec 30 '20 at 23:41
  • @CharlesDuffy Ack, sorry for not reading through properly. The dupe I selected might be vaguely relevant anyway, though the question isn't particularly good for a canonical; https://stackoverflow.com/questions/24306205/file-not-found-error-when-launching-a-subprocess-containing-piped-commands – tripleee Dec 31 '20 at 05:01

1 Answers1

2

This doesn't work even without Python being involved anywhere; it's an awk problem, not a Python or subprocess problem.

If your shell code was:

awk '{print $NF,($NF-1) | "sort -k 2 -n" | $NF=""; print}'

...it would still fail with an awk syntax error on the pipe character:

awk: syntax error at source line 1
 context is
    {print(NF),$(NF-1) | "sort -k 2 -n" >>>  | <<<  $NF="";print}
awk: illegal statement at source line 1

By contrast, one could make it work in shell by using a three-process pipeline:

awk '{print $NF, $(NF - 1)}' file2 \
  | sort -nk2 \
  | awk '{ $NF=""; print }' >file3

...and that works fine in Python too:

p1 = subprocess.Popen(['awk', '{print $NF, $(NF - 1)}', file2],
                      stdout=subprocess.PIPE)
p2 = subprocess.Popen(['sort', '-nk1'],
                      stdin=p1.stdout, stdout=subprocess.PIPE)
p3 = subprocess.Popen(['awk', '{ $NF=""; print }'],
                      stdin=p2.stdout, stdout=open(file3, 'w'))

p1.stdout.close()
p2.stdout.close()
p3.wait()

...though it's a lot more trouble than just doing all your logic in native Python, and not needing awk or sort at all:

content = [ line.split()[-2:] for line in open(file1).readlines() ]
content.sort(key=lambda x: x[1])
open(file3, 'w').write('\n'.join([item[0] for item in content]) + '\n')
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • @Anu, btw -- do see the addendum showing how to do all the work in Python without needing `awk` _or_ `sort`. – Charles Duffy Dec 30 '20 at 16:32
  • Thanks. nice compact soln for python. I might try to use a combo of the above solutions. Say I have 10 cols in my csv. I want to capture 1st col + last 2 cols in my new csv file, sort the new csv file by 1st col, after sorting, remove the 1st col. May be i can create a new csv using popen and use the python soln for the rest. – Anu Dec 30 '20 at 21:51
  • Python has a really great CSV library. I don't know where your data is originally coming from, but in general building CSVs is something it does well. – Charles Duffy Dec 30 '20 at 23:38
  • I used the suggestion provided by Charles. This is my final solution: `subprocess.call(["awk","-F,",'{print $1,$(NF-1),$NF }', file1],stdout=output)` `content = [ line.split(' ',1) for line in open(os.path.abspath(outfile)).readlines() ]` , `content.sort(key=lambda x:x[0])` followed by the writing the file – Anu Dec 31 '20 at 06:56