0

I am using GNU sed version 4.2.1 and I am trying to write a non-greedy SED regex to extract a string that delimited by two other strings. This is easy when the delimiting strings are single-character:

s:{\([^}]*\)}:\1:g

In that example the string is delimited by '{' on the left and '}' on the right.

If the delimiting strings are multiple characters, say '{{{' and '}}}' I can adjust the above expression like this:

s:{{{\([^}}}]*\)}}}:\1:g

so the centre expression matches anything not containing the '}}}' closing string. But this only works if the match string does not contain '}' at all. Something like:

{{{cannot match {this broken} example}}}

will not work but

{{{can match this example}}}

does work. Of course

s:{{{\(.*\)}}}:\1:g

always works but is greedy so isn't suitable where multiple patterns occur on the same line.

I understand [^a] to mean anything except a and [^ab] to mean anything except a or b so, despite it appearing to work, I don't think [^}}}] is the correct way to exclude that sequence of 3 consecutive characters.

So how to I write a regex for SED that matches a string that is delimited bt two other strings ?

starfry
  • 9,273
  • 7
  • 66
  • 96

2 Answers2

1

You are correct that [^}}}] doesn't work. A negated character class matches anything that is not one of the characters inside it. Repeating characters doesn't change the logic. So what you wrote is the same as [^}]. (It is easy to see why this works when there are no braces inside the expression).

In Perl and compatible regular expressions, you can use ? to make a * or + non-greedy:

 s:{{{(.*?)}}}:$1:g

This will always match the first }}} after the opening {{{.

However, this is not possible in Sed. In fact, I don't think there is any way in Sed of doing this match. The only other way to do this is use advanced features like look-ahead, which Sed also does not have.

You can easily use Perl in a sed-like fashion with the -pe options, which cause it to take a single line of code from the command line (-e) and automatically loop over each line and print the result (-p).

perl -pe 's:{{{(.*?)}}}:$1:g'

The -i option for in-place editing of files is also useful, but make sure your regex is correct first!

For more information see perlrun.

Community
  • 1
  • 1
  • thank you for your answer - it's what I suspected as i knew about sed not being able to look ahead. I found that I didn't need to escape the capture group in your example: `'s:{{{(.*?)}}}:$1<:g'` (in fact, when I did, it didn't work). – starfry Mar 05 '13 at 13:00
  • @starfry, oops, you are right about the capture group. That was a typo. –  Mar 05 '13 at 13:27
0

With sed you could do something like:

sed -e :a -e 's/\(.*\){{{\(.*\)}}}/\1\2/ ; ta'

With:

{{{can match this example}}} {{{can match this 2nd example}}}

This gives:

can match this example can match this 2nd example

It is not lazy matching, but by replacing from right to left we can make use of sed's greediness.

Scrutinizer
  • 9,608
  • 1
  • 21
  • 22