5

I have a file which contains lines of the following format:

w1#1#x w2#4#b w3#2#d ...

Each word (token) in the line (e.g. w1#1#x) is made of 3 parts, the first showing some index (w1 in this case), the second is an integer (1 in this case) , and the third is a character (x in this case)

Now, for each word (token), I need to print an additional field which will be calculated based on value of second and third part (i.e., the 4th part will be a function of 2nd and 3rd part), and the output file should look like:

w1#1#x#f1 w2#4#b#f2 w3#2#d#f3 ...

where

f1 = function(1,x), f2 = function(4,b), f3 = function (2,d)

Now, using the sed patterns I can identify the components in every word (token), e.g.,

echo $line | sed "s/([^#])#([^#])#([^# ]*) /\1#\2#\3 /g"

where \2 and \3 are parts of the pattern (I am calling them parts of the pattern because of this link)

Now, I need to compute the 4th part using \2 and \3. I have defined a shell function getInfo() which takes 2 arguments and does the required computation and gives me back the 4th part. The problem is inserting this function in the sed command. I tried following:

echo $line | sed "s/([^#])#([^#])#([^# ]*) /\1#\2#\3`getInfo \2 \3` /g"

but this is not working. Shell is not receiving the parts of the pattern as arguments.

So the question is:

How to pass the sed parts of the pattern to a shell (function)?

I can easily write a shell script which would split the line word-by-word and do the required job and then stitch the file back, but I would really appreciate if shell can receive parts of the pattern as arguments from sed within the sed command.

Regards,

Salil Joshi

Salil
  • 1,739
  • 2
  • 15
  • 25

2 Answers2

6

This might work for you:

func(){ echo "$1#$2#$3#$2$3"; }
export -f func
echo "w1#1#x w2#4#b w3#2#d" |
sed 's/\([^#]*\)#\([^#]*\)#\([^ ]*\) \?/echo -n "$(func \1 \2 \3) "; /g;s/$/echo ""/' |
sh
w1#1#x#1x w2#4#b#4b w3#2#d#2d 

Or if you have GNU sed:

func(){ echo "$1#$2#$3#$2$3"; }
export -f func
echo "w1#1#x w2#4#b w3#2#d" |
sed 's/\([^#]*\)#\([^#]*\)#\([^ ]*\) \?/echo -n "$(func \1 \2 \3) "; /ge;s/.$//'
w1#1#x#1x w2#4#b#4b w3#2#d#2d
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
potong
  • 55,640
  • 6
  • 51
  • 83
  • 2
    +1 for a clever demonstration that it can be done with `sed` with enough trickery. It is definitely (IMO) pushing beyond the limits of what is sensible, though. In particular, if the data file contained shell metacharacters, then executing the shell function could be dangerous. If the data is solely the multiple-hash entries with simple white space and alphanumerics, you're OK. – Jonathan Leffler Jan 29 '12 at 15:47
  • impressive... like @JonathanLeffler mentioned, even though this might not work in all cases, I will keep this for future reference. Thanks a lot. – Salil Jan 31 '12 at 05:49
  • Note for Darwin users, I had to change a bit the sed syntax this way (`-E` switch, unescaping regex directives) : `echo "w1#1#x w2#4#b w3#2#d" | sed -E 's/([^#]*)#([^#]*)#([^ ]*) ?/echo "$(func \1 \2 \3)";/g;s/$/echo ""/' | sh ` – bric3 Jan 13 '14 at 23:11
3

There comes a point at which sed is no longer the correct tool for the job. I think this task has reached that point (but see the clever answer by potong which shows that it can be done with bash and sed).

Which alternative tool do you use? You don't show the function, but if it can be conveniently calculated in the shell with a shell function, the chances are that awk is powerful enough to do the job. I'd probably fall back on Perl myself, but Python (or Ruby) would also work well. All of these allow you to write a function, to read the data and apply the function to the data before writing the data back out.

The problem with trying to use a function in sedis that it has no mechanism to define functions or to execute shell functions. To use sed, you'd have to think in terms of two passes through the data, the first extracting the (unique) tokens for subsequent processing, which would be to apply the shell function to each token, generating a sed script which simply matches each token and substitutes it with its replacement, followed by applying that script in the second pass over the data.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Thank you for the suggestions Jonathan. Like I mentioned, I can write these functions (even in Python, Ruby etc.), but for long time, I was wondering if sed can export those parts back to shell. From your answer, the answer seems to be NO :-( – Salil Jan 29 '12 at 14:02
  • 2
    Correct - the answer is (for all practical purposes) No. If it was a life-or-death situation where Perl, Awk, Python could not be used, then a multi-step scheme could be devised using `sed`, `uniq`, `sort`, your shell function, etc. But unless there is an idiotic restriction on the tool set available, `sed` is not the correct answer this time. – Jonathan Leffler Jan 29 '12 at 14:21