Multiple lazy string replacing between two patterns with sed

Question

Example:

This (word1) is a test (word2) file.

What I want:

This is a test file.

The problem is that the brackets occur more than once, so if I use:

sed 's/<.*>//g'

I get This file which it's wrong.

How about if I want to replace the string between two same patterns?

Like:

WORD1 %WORD2% WORD3 => WORD1 WORD3

Exactly. But the parentheses are just a very simple example, it could also be more than one symbol like #/to be replaced/# or %to be replaced% — Lobby2, Dec 16 '15 at 12:46
@Lobby2: Again, why *same patterns*? Where are the identical parts? What do you expect as an output for `WORD1 %WORD2% WORD3 something WORD1 %WORD2% WORD3`? — Wiktor Stribiżew, Dec 16 '15 at 12:56
The nominated duplicate specifically answers that case (too). Please review existing questions before posting here. Thanks. — tripleee, Dec 16 '15 at 12:56
@tripleee: Not all of those answers work with BRE regex. In BRE, there is no `+` support. No lazy quantifiers. And the update is still a bit unclear now. — Wiktor Stribiżew, Dec 16 '15 at 12:58
Adding a BRE answer to the (wannabe) canonical question would be a very welcome addition indeed. However, because the [regex] tag is so full of gunk, it's hard to point to a single preferred canonical question. If you can point to a better duplicate, you can do that -- flag for moderator attention to redirect the duplicate notice. — tripleee, Dec 16 '15 at 13:00
I have just found out that `\+` works, so it is not pure BRE. Here is a potentially [another duplicate](http://stackoverflow.com/questions/10613643/replace-a-unknown-string-between-two-known-strings-with-sed). — Wiktor Stribiżew, Dec 16 '15 at 13:03
@stribizhev That's an excellent one! Now just leave it to OP to decide which one to upvote (or maybe both). — tripleee, Dec 16 '15 at 13:03
@stribizhev: WORD1 WORD3 as output. But I find out that also works with sed 's/%[^%%]*%//g' — Lobby2, Dec 16 '15 at 13:04
There is no need repeating `%` inside the character class. A `[...]` construct only matches 1 single character from the set specified inside the square brackets. — Wiktor Stribiżew, Dec 16 '15 at 13:09

Wiktor Stribiżew · Accepted Answer · 2015-12-16T14:36:50.170

4

All you need is a negated character class [^<>]* that will match any character but a < or >:

sed 's/<[^<>]*>//g'

Or, if you have round brackets you can use [^()]* (note that in BRE syntax, to match a literal ( or ) escaping \ is not necessary):

sed 's/([^()]*)//g'

See IDEONE demo

As for the update, you can remove everything from WORD1 till WORD3 using .*, but only if there is only one set of WORD1 and WORD3 (demo):

echo "WORD1 %WORD2% WORD3" | sed 's/WORD1.*WORD3/WORD1 WORD3/g'

With sed, it is not possible to use lookarounds (lookaheads here), nor lazy quantifiers to restrict the match to the leftmost WORD3 occurrences. And if you know for sure there is no % symbol in between, you can still use the negated character class approach (demo):

echo "WORD1 %WORD2% WORD3" | sed 's/%[^%]*%//g'

A generic solution is to do it in several steps:

replace the starting and ending delimiters with unused character (<UC>) (I am using Russian letters, but it should be some control character)
use the negated character class <UC1>[^<UC1><UC2>]*<UC2> to replace with the necessary replacement string
restore the initial delimiters.

Here is an example:

#!/bin/bash
echo "WORD1 %WORD2% WORD3 some text WORD1 %WORD2% WORD3" | 
  sed 's/WORD1/й/g' |
  sed 's/WORD3/ч/g' |
  sed 's/й[^йч]*ч/й ч/g' |
  sed 's/й/WORD1/g' |
  sed 's/ч/WORD3/g' 
 // => WORD1 WORD3 some text WORD1 WORD3

I am hardcoding a space, but it can be adjusted whenever necessary.

edited Dec 16 '15 at 14:36

answered Dec 16 '15 at 12:12

Wiktor Stribiżew

607,720
39
448
563

Now I have another problem: how about if I wanna replace the string between two same patterns? Like WORD1 %WORD2% WORD3 => WORD1 WORD3? – Lobby2 Dec 16 '15 at 12:30
These are not the same if you mean you have known `WORD1` and `WORD3` and you need to remove all between them. Maybe you need [this](http://ideone.com/arFuZx). – Wiktor Stribiżew Dec 16 '15 at 12:40
This is a very common question. Please refrain from answering if you do not have the time to hunt down a good duplicate. – tripleee Dec 16 '15 at 12:41
You have a gold badge in the regex tag. Your answer contains no lookarounds. – tripleee Dec 16 '15 at 12:43
If you are referring to the OP's follow-up question; no, I am ignoring that. If the OP has a new question, they should post a new question, or edit the current question. – tripleee Dec 16 '15 at 12:48
@triplee: sorry I´m just new here, now I know to update the question :P – Lobby2 Dec 16 '15 at 12:50

Multiple lazy string replacing between two patterns with sed

1 Answers1

Linked