3

Example:

This (word1) is a test (word2) file.

What I want:

This is a test file.

The problem is that the brackets occur more than once, so if I use:

sed 's/<.*>//g'

I get This file which it's wrong.


How about if I want to replace the string between two same patterns?

Like:

WORD1 %WORD2% WORD3 => WORD1 WORD3
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Lobby2
  • 41
  • 5
  • so you want to remove all text inside parentheses? – fedorqui Dec 16 '15 at 12:19
  • Exactly. But the parentheses are just a very simple example, it could also be more than one symbol like #/to be replaced/# or %to be replaced% – Lobby2 Dec 16 '15 at 12:46
  • 1
    Please update the question providing more details. – Wiktor Stribiżew Dec 16 '15 at 12:51
  • @Lobby2: Again, why *same patterns*? Where are the identical parts? What do you expect as an output for `WORD1 %WORD2% WORD3 something WORD1 %WORD2% WORD3`? – Wiktor Stribiżew Dec 16 '15 at 12:56
  • The nominated duplicate specifically answers that case (too). Please review existing questions before posting here. Thanks. – tripleee Dec 16 '15 at 12:56
  • @tripleee: Not all of those answers work with BRE regex. In BRE, there is no `+` support. No lazy quantifiers. And the update is still a bit unclear now. – Wiktor Stribiżew Dec 16 '15 at 12:58
  • Adding a BRE answer to the (wannabe) canonical question would be a very welcome addition indeed. However, because the [regex] tag is so full of gunk, it's hard to point to a single preferred canonical question. If you can point to a better duplicate, you can do that -- flag for moderator attention to redirect the duplicate notice. – tripleee Dec 16 '15 at 13:00
  • I have just found out that `\+` works, so it is not pure BRE. Here is a potentially [another duplicate](http://stackoverflow.com/questions/10613643/replace-a-unknown-string-between-two-known-strings-with-sed). – Wiktor Stribiżew Dec 16 '15 at 13:03
  • @stribizhev That's an excellent one! Now just leave it to OP to decide which one to upvote (or maybe both). – tripleee Dec 16 '15 at 13:03
  • @stribizhev: WORD1 WORD3 as output. But I find out that also works with sed 's/%[^%%]*%//g' – Lobby2 Dec 16 '15 at 13:04
  • There is no need repeating `%` inside the character class. A `[...]` construct only matches 1 single character from the set specified inside the square brackets. – Wiktor Stribiżew Dec 16 '15 at 13:09

1 Answers1

4

All you need is a negated character class [^<>]* that will match any character but a < or >:

sed 's/<[^<>]*>//g'

Or, if you have round brackets you can use [^()]* (note that in BRE syntax, to match a literal ( or ) escaping \ is not necessary):

sed 's/([^()]*)//g'

See IDEONE demo

As for the update, you can remove everything from WORD1 till WORD3 using .*, but only if there is only one set of WORD1 and WORD3 (demo):

echo "WORD1 %WORD2% WORD3" | sed 's/WORD1.*WORD3/WORD1 WORD3/g'

With , it is not possible to use lookarounds (lookaheads here), nor lazy quantifiers to restrict the match to the leftmost WORD3 occurrences. And if you know for sure there is no % symbol in between, you can still use the negated character class approach (demo):

echo "WORD1 %WORD2% WORD3" | sed 's/%[^%]*%//g'

A generic solution is to do it in several steps:

  • replace the starting and ending delimiters with unused character (<UC>) (I am using Russian letters, but it should be some control character)
  • use the negated character class <UC1>[^<UC1><UC2>]*<UC2> to replace with the necessary replacement string
  • restore the initial delimiters.

Here is an example:

#!/bin/bash
echo "WORD1 %WORD2% WORD3 some text WORD1 %WORD2% WORD3" | 
  sed 's/WORD1/й/g' |
  sed 's/WORD3/ч/g' |
  sed 's/й[^йч]*ч/й ч/g' |
  sed 's/й/WORD1/g' |
  sed 's/ч/WORD3/g' 
 // => WORD1 WORD3 some text WORD1 WORD3

I am hardcoding a space, but it can be adjusted whenever necessary.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Now I have another problem: how about if I wanna replace the string between two same patterns? Like WORD1 %WORD2% WORD3 => WORD1 WORD3? – Lobby2 Dec 16 '15 at 12:30
  • These are not the same if you mean you have known `WORD1` and `WORD3` and you need to remove all between them. Maybe you need [this](http://ideone.com/arFuZx). – Wiktor Stribiżew Dec 16 '15 at 12:40
  • This is a very common question. Please refrain from answering if you do not have the time to hunt down a good duplicate. – tripleee Dec 16 '15 at 12:41
  • You have a gold badge in the regex tag. Your answer contains no lookarounds. – tripleee Dec 16 '15 at 12:43
  • If you are referring to the OP's follow-up question; no, I am ignoring that. If the OP has a new question, they should post a new question, or edit the current question. – tripleee Dec 16 '15 at 12:48
  • @triplee: sorry I´m just new here, now I know to update the question :P – Lobby2 Dec 16 '15 at 12:50