Using sed to replace a possibly ambiguous substring

Question

I have been using sed to strip CVS keywords from many, many files but I have encountered a case where multiple CVS keywords appear on the same line, for which a I do not have an adequate solution. For example, suppose the following line existed in a file:

$Revision: 1.2 $  $Date: 2015/01/06 17:14:53 $

Now I want the result to like:

$Revision$  $Date$

However, the sed command I have been using:

sed -i -e 's/\(\$Revision:\).*\( \$\)/\$Revision\$/'

finds the outter most fit to the search, which strips the Date keyword:

$Revision$

Without any assumptions on order (Revision and Date might be flipped) nor any assumptions of their line placement (cant assume beginning or end of line), how can I strip the keywords independently?

You need non-greedy regexes which are in Perl, but probably not in sed. http://stackoverflow.com/questions/1103149/non-greedy-regex-matching-in-sed — chicks, Jun 30 '15 at 18:46

hek2mgl · Accepted Answer · 2015-06-30T19:47:56.713

3

The basic problem you stumbled upon is the fact that sed matches greedy, meaning:

sed 's/a.*a/b/' <<< 'a_a_a'

will produce

and not(!)

b_a

In many regular expression engines you can use something like a.*?a to perform a none greedy match, but in sed's regular expressions lanaguage (basic or extended POSIX regular expressions) you need to use:

sed 's/a[^a]*a/\b/' <<< 'a_a_a'

Based on that, you can use the following sed commands:

sed -r 's/(\$Revision):[^$]+\$/\1$/' input.file

Note: Check always if it is working before you use the -i option.

If you want a list of tags getting replaced, use

sed -r 's/\$(Revision|Date|AndSoOn):[^$]+\$/$\1$/g' input.file

If you are about to apply it to all CVS tags, use:

sed -r 's/(\$[^:]+):[^$]+\$/\1$/g' input.file

edited Jun 30 '15 at 19:47

answered Jun 30 '15 at 19:11

hek2mgl

152,036
28
249
266

You can improve sed command for the _all CVS tags_ with `sed -r 's/(\$[^:]+):[^$]+/\1/g' input.file` – Rakholiya Jenish Jun 30 '15 at 19:21
@RakholiyaJenish Actually it should be: `sed -r 's/(\$[^:]+):[^$]+\$/\1\$/g' file` :) .. But will add that! – hek2mgl Jun 30 '15 at 19:23
Can you explain why `\$` is required after `[^$]+`? – Rakholiya Jenish Jun 30 '15 at 19:24
1

Because it might otherwise produce false matches like `$Revision: foobar` < --- without the closing `$`. I mean it is source code! we should be careful, isn't it? :) – hek2mgl Jun 30 '15 at 19:26
Thanks for explanation. – Rakholiya Jenish Jun 30 '15 at 19:27
Worked like a charm! Thanks. – Z K Jun 30 '15 at 19:57

Cyrus · Answer 2 · 2015-06-30T19:19:42.607

1

Try this with GNU sed:

sed -i '/^$Revision/{s/:[^$]*\$/$/g}' file

Output:

$Revision$  $Date$

edited Jun 30 '15 at 19:19

answered Jun 30 '15 at 19:13

Cyrus

84,225
14
89
153

Using sed to replace a possibly ambiguous substring

2 Answers2