-1

I am trying to replace string between two strings in a file with the command below. There could be any number of such patterns in the file. This is just an example.

sed 's/word1.*word2/word1/' 1.txt 

There are two instances where 'word1' followed by 'word2' occurs in the sample source file I'm testing. Content of the 1.txt file

word1---sjdkkdkjdk---word2 I want this text----word1---jhfnkfnsjkdnf----word2 I need this also

Result is as below.

word1 I need this also

Expected Output :

 word1 I want this text----word1 I need this also

Can anybody help me with this please?

I looked at other stack-overflow questionnaire but they discuss about replacing only one instance of the pattern.

Atom
  • 768
  • 1
  • 15
  • 35
  • Possible duplicate of [Why does sed not replace all occurrences?](https://stackoverflow.com/q/15849119/608639) – jww Sep 07 '18 at 01:12
  • 1
    @jww That's not a duplicate. OP has a different problem. – Shawn Sep 07 '18 at 01:18
  • You should have included a case like `x word1 foo word1 bar word2 y` in your example so we could see if you'd want the output of that to be `x world1 y` (outermost match) or `x word1 foo word1 y` (innermost match) or something else. – Ed Morton Sep 07 '18 at 02:39
  • @jww It is a different problem. – Atom Sep 07 '18 at 03:23

3 Answers3

1

Regular expressions are greedy - they match the longest possible string, so everything from the first 'word1' to the last 'word2'. Not sure if any version of sed supports non-greedy regexps... you could just use perl, though, which does:

perl -pe 's/word1.*?word2/word1/g' 1.txt

should do the trick. That ? changes the meaning of the prior * from 'match as many times as possible as long as the rest of the pattern matches' to 'match as few times as possible as long as the rest of the pattern matches'.

Shawn
  • 47,241
  • 3
  • 26
  • 60
1
$ sed 's/@/@A/g; s/{/@B/g; s/}/@C/g; s/word1/{/g; s/word2/}/g; s/{[^{}]*}/word1/g; s/}/word2/g; s/{/word1/g; s/@C/}/g; s/@B/{/g; s/@A/@/g' file
word1 I want this text----word1 I need this also

It's lengthy and looks complicated but it's a technique that is used fairly often and is really just a series of simple steps to robustly convert word1 to { and word2 to } so you're dealing with characters instead of strings in the actual substitution s/{[^{}]*}/word1/g and so can use a negated bracket expression to avoid the greedy regexp taking up too much of the line.

See https://stackoverflow.com/a/35708616/1745001 for more info on the general approach used here to be able to turn strings into characters that cannot be present in the input by the time the real work takes place and then restore them again afterwards.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Let me try this Ed. – Atom Sep 07 '18 at 03:04
  • It worked for most of the cases but didn't work for me when word1 is 's3a://' and word2 is '@'. I have escaped the '/' characters as well. It is adding an extra 'A' somehow. Here is the command I tried for your reference. Shawn's answer above worked for me though. Thank you so much for your help and helping me understand Ed. Upvoted. – Atom Sep 07 '18 at 17:39
  • Right, think about what it's doing. When you want to create characters that don't exist otherwise to map your target strings onto, you can't have one of those strings actually be (or include) one of the characters you're trying to use for that purpose. Choose a different character in that case. Similarly, if your words contain `/` then don't use `/` as the sed delimiter, use some other character. – Ed Morton Sep 07 '18 at 18:28
0

If you only have two instances of the word1-word2 pattern on a line, this should work:

sed 's/\(word1\).*word2\(.*\)\(word1\).*word2\(.*\)/\1\2\3\4/' 1.txt

I grab the parts we want to keep inside escaped brackets \( and \) then I can refer to those parts as \1 \2 and so on.

  • No David, what I provided is sample data. The file may contain any number of occurances. – Atom Sep 07 '18 at 03:05