0

I have data in zdt format (like this), where I want to perform this python script only on the third column (the pinyin one). I have tried to do this with sed and awk but I have not had any success due to my limited knowledge of these tools. Ideally, I want to feed the column’s contents to the python script and then have the source replaced with the yield of the script.

This is roughly what I envision but the call is not executed, not even when in quotes.

s/([a-z]+[1,2,3,4]?)(?=.*\t)/decode_pinyin(\1)/g

I am not too strict of the tools (sed, awk, python, …) used, I just want a shell script for batch processing of a number of files. It would be best if the original spaces are preserved.

Community
  • 1
  • 1
brian-ammon
  • 133
  • 7
  • did you want to store only the third column in a txt file for later processing? – Avinash Raj May 31 '14 at 16:03
  • 1
    You should use python for the whole solution – hek2mgl May 31 '14 at 16:04
  • @AvinashRaj No I want to keep the original content so that the first colum for example becomes `入鄉隨俗 入乡随俗 rùxiāng suísú /When in Rome, do as the Romans/` – brian-ammon May 31 '14 at 16:05
  • 1
    `sed` can't call functions when it does substitution. But several scripting languages can do it: Perl has the `e` modifier, PHP has `preg_replace_callback()`, Javascript allows the replacement in `RegExp::replace` to be a function, and I'll bet Python has something similar. – Barmar May 31 '14 at 17:28
  • possible duplicate of [Call functions from re.sub](http://stackoverflow.com/questions/11944978/call-functions-from-re-sub) – brian-ammon Jun 01 '14 at 17:11

1 Answers1

0

Try something like this:

awk -F'\t' '{printf "decode_pinyin(\"%s\")\n", $3}' file

This outputs:

decode_pinyin("ru4xiang1 sui2su2")
decode_pinyin("ru4")
decode_pinyin("xiang1")
decode_pinyin("sui2")
decode_pinyin("su2")
Scrutinizer
  • 9,608
  • 1
  • 21
  • 22