SED: Deleting text between two strings, repeated across the line

Question

The issue is that I wish to remove all text between to strings on a line using SED. I understand the use of: sed -i 's/str1.*str2//' file.dat to remove the text between str1 and str2, inclusive of str1 and str2, but my line has str1 and str2 repeated on the line many times, and I would like to remove the text between each pair. My attempt above removes all text between the first instance of str1 and the last instance of str2. Would appreciate some help in understanding the function to do this.

In addition I would like to repeat this across all lines in the file, and do not know how many times the str1, str2 pair appears on each line. It varies.

Kind Regards

Additional Edit - hope not into a flame-wall!

An example may be of use; Having trouble understanding the answers thus far sorry guys.

For a single line in a file example.dat;

bla.bla.TextOfUnknownLength.bla.bla 1023=3 290=1 336=17 273=07:59:57.833 276=K 278=0 bla.bla.TextOfUnknownLength.bla.bla 1023=20 290=2 336=7 273=07:59:57.833 276=K 278=0 bla.bla.TextOfUnknownLength.bla.bla ...

I wish to remove from 1023= to 278= inclusive (but not the 0 after 278=) in all instances, this text between 1023= and 278= can occur many times in a line and is of unknown length.

There are also many lines in the file, and I would like to run this across all lines.

HS

You may want to check [How to select lines between two marker patterns which may occur multiple times with awk/sed](http://stackoverflow.com/a/17988834/) — fedorqui, Jan 19 '15 at 10:06
@Boris Fedorquis answer to that question would not work here are it is matching lines between the patterns whereas OPs(as far as i can tell) is on single lines. — , Jan 19 '15 at 11:06

Marc Bredt · Answer 1 · 2015-01-19T16:36:50.620

2

sed -ri 's/(foo)(.*)(bar)/\1\3/g' between.file

explanation. use regular expressions -r to match the part before,between and after in your line. then just replace with the prefix \1 and the suffix \2 using sed's internal replacement variables with leading backslashes.

UPDATE: Consider between.file contains the following contents.

foo---1---bar
foo---2---bar
foo---3---bar

Then the command above removes the contents between foo and bar, so the output looks like

foobar
foobar
foobar

Wasn't that your desired output/change in your file?

UPDATE: I think awk fits better for your needs.

Assume the beween.file contains the following lines

A foo---1---bar B foo---10--bar C 
A foo---2---bar D foo---20--bar E 
A foo---3---bar B foo---30---bar C

this script

#!/bin/bash
awk '{                            
                 all="";
                 for(i=0; i<=NF; i++) { 
                   if(!($i~/foo.*bar/)) { all=all" "$i; } 
                 };                            
                 print all;
               }' between.file

will produce the following output

 A B C
 A D E
 A B C

You could use this to implement some kind of DFA to switch into a specific state when reading 1023= and leaving this reading 278=.

Redirect the output to a new file or search the docuMANtation for awk to process directly on a file. hope this helps.

edited Jan 19 '15 at 16:36

answered Jan 19 '15 at 10:10

Marc Bredt

905
5
13

No need to capture `(.*)`. Good answer though :) – Jan 19 '15 at 11:04
Thanks guys, just for clarity, and re: my example above; sed -ri 's/(1023=)(.*)(278=)/????/g' between.file What do I put in place of "\1" and "\3" in your example to remove the text – Hanna Jan 19 '15 at 12:36
nothing to replace with. `\1` and `\3` is replaced with `1023=`, `278=` respectively in a way 1023=278= is printed/written. – Marc Bredt Jan 19 '15 at 13:35
Thanks again Emil - I see that working for the line - but removes all between the FIRST 1023= and the LAST 278=. The line has many pairs of these across the line, and I am trying to remove the text between each pair, for all pairs. Regards. HS – Hanna Jan 19 '15 at 14:55
for between.file containing the following contents; A foo---1---bar X foo---10--bar Y A foo---2---bar X foo---20--bar Y A foo---3---bar X foo---30---bar Y I am looking for the resultant file to be; A X Y A X Y A X Y the string "foo--sometext---bar" is repeated many times over the line – Hanna Jan 19 '15 at 15:07
with each line in my latest comment starting with A - cant get the formatting right sorry - noob – Hanna Jan 19 '15 at 15:10
is the length/amount of the portions between known? e.g. the foo-bar combination occurs `n` times? or does that occur random times in each line? – Marc Bredt Jan 19 '15 at 15:30
length between foo and bar random, for each instance, and foo-bar pairs are random for each line - (anything from 1 to over 70 occurrences) - thanks for all time on this Emil - will need to get more into awk - one more thing (promise) does this work on windows gawk ? think i need to change but don't know how sorry. Kind Regards HS – Hanna Jan 19 '15 at 17:25
just give it a try. with gawk on *nix it works. i think if your gawk is shipped with cygwin or in any other way it should work as well. – Marc Bredt Jan 19 '15 at 17:29

score 0 · Answer 2 · answered Jan 19 '15 at 11:47

just add the g ath the end of your sed.

sed -i 's/str1.*str2//g' file.dat

g mean: for each occurence on the current buffer, by default this is the current line.
sed work by default 1 line at a time, then at the end of the action, continue with the next one.

Remark with this:

if str1 and str2 are not on the same line, no change between those 2
str1 ans str2 are part of the pattern so some special character need to be escaped sometimes (like (,{,[,\,&,^,.,.. depending of wanted behaviour.

score 0 · Answer 3 · answered Jan 19 '15 at 15:21

This might work for you (GNU sed):

sed -r ':a;s/([^\n]*)(foo)[^\n]+(bar)/\1\n\2\3/;ta;s/\n//g' file

Use greed, an unique delimiter and a loop to remove characters between foo and bar. The greed works backwards through the line and the delimiter stops the part of the line that has been processed from being processed again. The loop removes one or more occurances of foo through bar.

SED: Deleting text between two strings, repeated across the line

3 Answers3