Replacing newline character

Question

I have an XML file which has occasional lines that are split into 2: the first line ending with . I want to concatenate any such lines and remove the , perhaps replacing it with a space.

e.g.

<message>hi I am&#13;
here </message>

needs to become

<message>hi I am here </message>

I've tried:

sed -i 's/&#13;\/n/ /g' filename

with no luck.

Any help is much appreciated!

SO correctly suggests this as a related question: http://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n?rq=1 — Lev Levitsky, May 07 '14 at 21:16

fedorqui · Answer 1 · 2014-05-07T21:56:20.687

You can use this awk:

awk -F"&#13;" '/&#13;$/ {a=$1; next} a{print a, $0; a=""; next} 1' file

Explanation

-F"" set  as delimiter, so that the first field will be always the desired part of the string.
/$/ {a=$1; next} if the line ends with , store it in a and jump to the next line.
a{print a, $0; a=""; next} if a is set, print it together with current line. Then unset a for future loops. Finally jump to next line.
1 as true, prints current line.

Sample

$ cat a
yeah
<message>hi I am&#13;
here </message>
hello
bye

$ awk -F"&#13;" '/&#13;$/ {a=$1; next} a{print a, $0; a=""; next} 1' a
yeah
<message>hi I am here </message>
hello
bye

score 2 · Answer 2 · answered May 07 '14 at 21:33

2

give this gawk one-liner a try:

awk -v RS="" 'gsub(/&#13;\n/," ")+7' file

tested here with your example:

kent$ echo "<message>hi I am&#13;
here </message>"|awk -v RS="" 'gsub(/&#13;\n/," ")+7'  
<message>hi I am here </message>

answered May 07 '14 at 21:33

Kent

189,393
32
233
301

PradyJord · Answer 3 · 2014-05-07T22:01:04.360

This will work for you:

sed -i '{:q;N;s/&.*\n/ /g;t q}' <filename>

However replacing newline with sed is always a bash(read bad) idea. Chances of making an error are high.

So another but simpler solution:

tr -s '\&\#13\;\n' ' ' < <filename>

tr is replacing all chracter in match with space, so without -s it would have printed

<message>hi I am      here </message>

-s from man page:

   -s, --squeeze-repeats
          replace  each  input  sequence of a repeated character that is listed in SET1 with a single occurrence of that character.

score 2 · Accepted Answer · answered May 07 '14 at 22:05

2

Here is a GNU sed version:

sed ':a;$bc;N;ba;:c;s/&#13;\n/ /g' file

Explanation:

sed '
    :a              # Create a label a
    $bc             # If end of file then branch to label c
    N               # Append the next line to pattern space
    ba              # branch back to label a to repeat until end of file
    :c              # Another label c
    s/&#13;\n/ /g   # When end of file is reached perform this substitution
' file

answered May 07 '14 at 22:05

jaypal singh

74,723
23
102
147

1

This worked after I ran dos2unix on my file. Thanks for everyone's help. – schoon May 08 '14 at 19:59

Replacing newline character

4 Answers4

Explanation

Sample

Explanation: