Removing Windows newlines on Linux (sed vs. awk)

Question

Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows \r\n patterns:

$ xxd test.txt | grep 0d0a
0000190: 3932 3139 322d 3239 3836 0d0a 0d0a 7c43

I can remove them with awk, but am unable to do the same with sed.

This works in awk, removing the line breaks completely:

awk 'gsub(/\r/,""){printf $0;next}{print}'

But this in sed does not, leaving line feeds in place:

sed -i 's/\r//g'

where this appears to have no effect:

sed -i 's/\r\n//g'

Using ^M in the sed expression (ctrl+v, ctrl+m) also does not seem to work.

For this sort of task, sed is easier to grok, but I am working on learning more about both. Am I using sed improperly, or is there a limitation?

@ephemient - which pattern is working for you? I have the same version of sed. — kermatt, Jul 27 '12 at 04:02
`sed 's/\r//g'`, even with `POSIXLY_CORRECT=1`. The second one of course does nothing, because `\n` is not part of the pattern space. — ephemient, Jul 27 '12 at 04:42
Does that sed delete the \r\n patterns, or replace them with \n? On my system a replacement occurs, not a removal. — kermatt, Jul 27 '12 at 15:04

kev · Answer 1 · 2012-07-27T08:11:06.080

68

You can use the command line tool dos2unix

dos2unix input

Or use the tr command:

tr -d '\r' <input >output

Actually, you can do the file-format switching in vim:

Method A:

:e ++ff=dos
:w ++ff=unix
:e!

Method B:

:e ++ff=dos
:set ff=unix
:w

EDIT

If you want to delete the \r\n sequences in the file, try these commands in vim:

:e ++ff=unix           " <-- make sure open with UNIX format
:%s/\r\n//g            " <-- remove all \r\n
:w                     " <-- save file

Your awk solution works fine. Another two sed solutions:

sed '1h;1!H;$!d;${g;s/\r\n//g}' input
sed ':A;/\r$/{N;bA};s/\r\n//g' input

edited Jul 27 '12 at 08:11

answered Jul 27 '12 at 03:05

kev

155,172
47
273
272

1

dos2unix leaves linefeeds (\n) in place. I need to remove them completely. tr only removes the \r, leaving the same result. – kermatt Jul 27 '12 at 03:16
tr -d '[\r\n]' turns the file in one giant line. It appears to remove the characters individually. – kermatt Jul 27 '12 at 03:36
@MattK Why `dos2unix` doesn't work? Can you post your sample input/output file? – kev Jul 27 '12 at 03:53
dos2unix appears to replace \r\n with \n. I need to delete the \r\n patterns, as the file already has Unix line endings, and the Windows pairs are garbage data within the lines. – kermatt Jul 27 '12 at 03:56
2012-07-26|123456||UserName1|0|2004-03-31 00:00:00.000|N||1|0000000002932f3d|San Diego|CA|United States of America|992192-2986^M 3 ^M 4 |CREDIT|2004-03-31 00:00:00.000|2004-03-31 00:00:00.000|31|N|N|N||Y|||0||||0|N|1|1|Y||||||||1||1|||N|2004-05-31 18:21:42.403|DEFAULT|||||||||||N||||Y|||||||| – kermatt Jul 27 '12 at 04:00
1

In vim, or even in plain-old **vi**, you can also remove Ctrl-M's at the ends of lines by typing `:%s/^V^M//`. The Ctrl-V causes the Ctrl-M to be escaped, so that you can include it in the expression. I do this in FreeBSD and OSX `vi` all the time. – ghoti Jul 27 '12 at 04:26
It does indeed work as expected in Vim. This work is a regular cron job,and IIRC I can pass the ex command through Vim in a shell script? – kermatt Jul 27 '12 at 04:38

score 25 · Accepted Answer · answered Jul 27 '12 at 03:04

25

I believe some versions of sed will not recognize \r as a character. However, you can use a bash feature to work around that limitation:

echo $string | sed $'s/\r//'

Here, you let bash replace '\r' with the actual carriage return character inside the $'...' construct before passing that to sed as its command. (Assuming you use bash; other shells should have a similar construct.)

answered Jul 27 '12 at 03:04

chepner

497,756
71
530
681

This appears to be the case. But I have large text groups to process, ~100MB files. Finding other examples of workarounds in bash. Looking for the one that will work in this situation. – kermatt Jul 27 '12 at 03:37
This seems to the be right path, but in the end, awk appears the be the answer. Its syntax is more complicated, but the regexes I give work as expected (same as in Vim). – kermatt Jul 27 '12 at 21:03

score 10 · Answer 3 · answered May 08 '16 at 12:35

10

sed -e 's/\r//g' input_file

This works for me. The difference of -e instead of -i command.

Also I mentioned that see on different platforms behave differently. Mine is:sed --version This is not GNU sed version 4.0

answered May 08 '16 at 12:35

Sergiy Dolnyy

101
1
3

score 7 · Answer 4 · answered Jun 04 '14 at 05:43

7

Another method

awk 1 RS='\r\n' ORS=

set Record Separator to \r\n
set Output Record Separator to empty string
1 is always true, and in the absence of an action block {print} is used

answered Jun 04 '14 at 05:43

Zombo

1
62
391
407

score 0 · Answer 5 · answered Jul 19 '23 at 06:16

I had the whole file appears as one line with "^M" symbols instead of new lines. The only solution that worked for me was inside vi type this command (don't copy & paste)

:%s/\r/\r/g

then save and exit using 'ZZ'

This command tells Vim to replace each carriage return character (\r, which appears as ^M) with a newline character. The % tells Vim to apply the command to every line in the file.

Removing Windows newlines on Linux (sed vs. awk)

5 Answers5

EDIT

Linked

Related