1

I have an XML file which has occasional lines that are split into 2: the first line ending with 
. I want to concatenate any such lines and remove the 
, perhaps replacing it with a space.

e.g.

<message>hi I am&#13;
here </message>

needs to become

<message>hi I am here </message>

I've tried:

sed -i 's/&#13;\/n/ /g' filename

with no luck.

Any help is much appreciated!

Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
schoon
  • 2,858
  • 3
  • 46
  • 78
  • 1
    SO correctly suggests this as a related question: http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n?rq=1 – Lev Levitsky May 07 '14 at 21:16

4 Answers4

2

You can use this awk:

awk -F"&#13;" '/&#13;$/ {a=$1; next} a{print a, $0; a=""; next} 1' file

Explanation

  • -F"&#13;" set &#13; as delimiter, so that the first field will be always the desired part of the string.
  • /&#13;$/ {a=$1; next} if the line ends with &#13;, store it in a and jump to the next line.
  • a{print a, $0; a=""; next} if a is set, print it together with current line. Then unset a for future loops. Finally jump to next line.
  • 1 as true, prints current line.

Sample

$ cat a
yeah
<message>hi I am&#13;
here </message>
hello
bye

$ awk -F"&#13;" '/&#13;$/ {a=$1; next} a{print a, $0; a=""; next} 1' a
yeah
<message>hi I am here </message>
hello
bye
fedorqui
  • 275,237
  • 103
  • 548
  • 598
2

give this gawk one-liner a try:

awk -v RS="" 'gsub(/&#13;\n/," ")+7' file

tested here with your example:

kent$ echo "<message>hi I am&#13;
here </message>"|awk -v RS="" 'gsub(/&#13;\n/," ")+7'  
<message>hi I am here </message>
Kent
  • 189,393
  • 32
  • 233
  • 301
2

This will work for you:

sed -i '{:q;N;s/&.*\n/ /g;t q}' <filename>

However replacing newline with sed is always a bash(read bad) idea. Chances of making an error are high.

So another but simpler solution:

tr -s '\&\#13\;\n' ' ' < <filename>

tr is replacing all chracter in match with space, so without -s it would have printed

<message>hi I am      here </message>

-s from man page:

   -s, --squeeze-repeats
          replace  each  input  sequence of a repeated character that is listed in SET1 with a single occurrence of that character.
PradyJord
  • 2,136
  • 12
  • 19
2

Here is a GNU sed version:

sed ':a;$bc;N;ba;:c;s/&#13;\n/ /g' file

Explanation:

sed '
    :a              # Create a label a
    $bc             # If end of file then branch to label c
    N               # Append the next line to pattern space
    ba              # branch back to label a to repeat until end of file
    :c              # Another label c
    s/&#13;\n/ /g   # When end of file is reached perform this substitution
' file
jaypal singh
  • 74,723
  • 23
  • 102
  • 147