How do I write a SED regex to extract a string delimited by another string?

Question

I am using GNU sed version 4.2.1 and I am trying to write a non-greedy SED regex to extract a string that delimited by two other strings. This is easy when the delimiting strings are single-character:

s:{\([^}]*\)}:\1:g

In that example the string is delimited by '{' on the left and '}' on the right.

If the delimiting strings are multiple characters, say '{{{' and '}}}' I can adjust the above expression like this:

s:{{{\([^}}}]*\)}}}:\1:g

so the centre expression matches anything not containing the '}}}' closing string. But this only works if the match string does not contain '}' at all. Something like:

{{{cannot match {this broken} example}}}

will not work but

{{{can match this example}}}

does work. Of course

s:{{{\(.*\)}}}:\1:g

always works but is greedy so isn't suitable where multiple patterns occur on the same line.

I understand [^a] to mean anything except a and [^ab] to mean anything except a or b so, despite it appearing to work, I don't think [^}}}] is the correct way to exclude that sequence of 3 consecutive characters.

So how to I write a regex for SED that matches a string that is delimited bt two other strings ?

score 1 · Accepted Answer · edited May 23 '17 at 11:49

You are correct that [^}}}] doesn't work. A negated character class matches anything that is not one of the characters inside it. Repeating characters doesn't change the logic. So what you wrote is the same as [^}]. (It is easy to see why this works when there are no braces inside the expression).

In Perl and compatible regular expressions, you can use ? to make a * or + non-greedy:

 s:{{{(.*?)}}}:$1:g

This will always match the first }}} after the opening {{{.

However, this is not possible in Sed. In fact, I don't think there is any way in Sed of doing this match. The only other way to do this is use advanced features like look-ahead, which Sed also does not have.

You can easily use Perl in a sed-like fashion with the -pe options, which cause it to take a single line of code from the command line (-e) and automatically loop over each line and print the result (-p).

perl -pe 's:{{{(.*?)}}}:$1:g'

The -i option for in-place editing of files is also useful, but make sure your regex is correct first!

For more information see perlrun.

thank you for your answer - it's what I suspected as i knew about sed not being able to look ahead. I found that I didn't need to escape the capture group in your example: `'s:{{{(.*?)}}}:$1<:g'` (in fact, when I did, it didn't work). — starfry, Mar 05 '13 at 13:00
@starfry, oops, you are right about the capture group. That was a typo. — , Mar 05 '13 at 13:27

Scrutinizer · Answer 2 · 2013-03-05T14:40:13.157

0

With sed you could do something like:

sed -e :a -e 's/\(.*\){{{\(.*\)}}}/\1\2/ ; ta'

With:

{{{can match this example}}} {{{can match this 2nd example}}}

This gives:

can match this example can match this 2nd example

It is not lazy matching, but by replacing from right to left we can make use of sed's greediness.

edited Mar 05 '13 at 14:40

answered Mar 05 '13 at 14:26

Scrutinizer

9,608
1
21
22

How do I write a SED regex to extract a string delimited by another string?

2 Answers2