1

I am trying to use sed to remove all newline characters between two search patterns.

I first found this post which taught me how to search between two patterns across lines.

sed -e '/begin/,/end/{s/begin/replacement/p;d}'

Then I found this post to help remove all newlines in a file.

sed ':a;N;$!ba;s/\n/ /g'

I have attempted to combine the two answers and came up with:

sed -e '/begin/,/end/{:a;N;$!ba;s/\n/ /p;d}'

However, it doesn't quite work. It replaces newlines starting from the correct line, but continues until the end of the file. An example is given below:

Sed Command:

sed -e '/Seven/,/Fifteen/{:a;N;$!ba;s/\n/ /g}' input.txt

input.txt:

One Two Three
Four Five Six
Seven Eight Nine
Ten Eleven Twelve
Thirteen Fourteen Fifteen
Sixteen Seventeen Eighteen
Nineteen Twenty Twenty-One

Output:

One Two Three
Four Five Six
Seven Eight Nine Ten Eleven Twelve Thirteen Fourteen Fifteen Sixteen Seventeen Eighteen Nineteen Twenty Twenty-One

What I really want:

One Two Three
Four Five Six
Seven Eight Nine Ten Eleven Twelve Thirteen Fourteen Fifteen
Sixteen Seventeen Eighteen
Nineteen Twenty Twenty-One

Thanks for any help!

Community
  • 1
  • 1
Benjamin Leinweber
  • 2,774
  • 1
  • 24
  • 41

4 Answers4

3

You need to change $ to /Fifteen/:

sed -e '/Seven/,/Fifteen/{:a;N;/Fifteen/!ba;s/\n/ /g}' input.txt

  • $!ba => jump to a if not last line
  • /Fifteen/!ba => jump to a if not match /Fifteen/

You can make the command shorter:

sed '/Seven/{:a;N;/Fifteen/!ba;s/\n/ /g}' input.txt
kev
  • 155,172
  • 47
  • 273
  • 272
  • 1
    Since you're controlling the loop in the code block, you don't need the `,/Fifteen/` in the initial address . – ooga Aug 23 '14 at 02:01
  • 1
    Beware `/Seven/...` will match twice in the sample data given, the first as expected but also with `Seventeen`. This never matches with a following `Fifteen` so terminates with end-of-file and the `N` command. As this may produce unexpected results if the above command were to be ameliorated perhaps `/\/` might be safer. – potong Aug 23 '14 at 03:17
2

Here is an awk verison:

awk '/Seven/ && !g {f=1;g=1} /Fifteen/ {f=0} {printf "%s%s",$0,(f?FS:RS)}' file
One Two Three
Four Five Six
Seven Eight Nine Ten Eleven Twelve Thirteen Fourteen Fifteen
Sixteen Seventeen Eighteen
Nineteen Twenty Twenty-One

Here is a gnu awk version (Word boundaries):

awk '/\<Seven\>/ {f=1} /\<Fifteen\>/ {f=0} {printf "%s%s",$0,(f?FS:RS)}' file
One Two Three
Four Five Six
Seven Eight Nine Ten Eleven Twelve Thirteen Fourteen Fifteen
Sixteen Seventeen Eighteen
Nineteen Twenty Twenty-One

Another awk version:

awk '/Seven/ && !/Seven[[:alnum:]]/ && !/[[:alnum:]]Seven/ {f=1} /Fifteen/ {f=0} {printf "%s%s",$0,(f?FS:RS)}' file
One Two Three
Four Five Six
Seven Eight Nine Ten Eleven Twelve Thirteen Fourteen Fifteen
Sixteen Seventeen Eighteen
Nineteen Twenty Twenty-One
Jotne
  • 40,548
  • 12
  • 51
  • 55
  • +1 wrt `printf "%s"(f?FS:RS),$0`, while it will behave as expected in this case just be aware it'll fail if FS or RS ever contains a `%` character. It's safer, and IMHO much clearer since it moves the separator after the $0 where it prints, to write `printf "%s%s",$0,(f?FS:RS)`. FYI in non-gawks you can use `/(^|[^[:alnum:]_])Seven([^[:alnum:]_]|$)/` instead of `/\/` so you don't have to write `/Seven/ && !/Seven[[:alnum:]]/ && !/[[:alnum:]]Seven/` – Ed Morton Aug 23 '14 at 12:52
  • Benjamin - the main difference between the above and the solution I posted is in how they'll behave if Seven is present in the input file but Fifteen is not present. Mine will make no changes while Jotne's will replace all newlines with spaces after Seven up until the end of the file. Obviously idk which is the desired behavior since you didn't specify. – Ed Morton Aug 23 '14 at 12:57
  • 1
    @EdMorton I do agree its better to use `"%s%s",$0,(f?FS:RS)`, but for `FS` or `RS` to contain `%`, you need to set it. – Jotne Aug 23 '14 at 13:33
  • Correct. Not completely sure I understand your point though. I'm just saying that using that syntax will fail if you have to set your FS or RS to contain a `%` to match your data so that's one more reason not to use it (in addition to clarity). – Ed Morton Aug 23 '14 at 13:37
  • I have tried these solutions and they do indeed work. Right now all I really know is RegEx, not sed nor awk. Truth be told, I don't have a clue how to interpret any of the awk code above. I would like to really learn these tools, so I appreciate an awk based answer but I have marked the sed based one as the answer because this is how the question was asked. – Benjamin Leinweber Aug 25 '14 at 16:43
1

sed is an excellent tool for simple subsitutions on a single line but for anything else (i.e. anything that involves language constructs other than s, g, and p with -n) just use awk.

Using GNU awk for multi-char RS, \< and \> word boundaries, and gensub():

$ gawk -v RS='^$' -v ORS= '{
    match($0,/\<Seven\>.*\<Fifteen\>/)
    print substr($0,1,RSTART-1) \
          gensub(/\n/," ","g",substr($0,RSTART,RLENGTH)) \
          substr($0,RSTART+RLENGTH)
}' file
One Two Three
Four Five Six
Seven Eight Nine Ten Eleven Twelve Thirteen Fourteen Fifteen
Sixteen Seventeen Eighteen
Nineteen Twenty Twenty-One

No exaggeration - all those arcane sed single-character language constructs (N, b, a, etc.) literally became obsolete in the mid-1970s when awk was invented and people no longer needed to use sed for multi-line editing.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

you can also use this sed method

sed '/Seven/{:loop ; N ;/\nSixteen/{p;d}; s/\n/ /g; t loop}' filename
Kalanidhi
  • 4,902
  • 27
  • 42