38

Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL database. Dumping the data in hex shows \r\n patterns:

$ xxd test.txt | grep 0d0a
0000190: 3932 3139 322d 3239 3836 0d0a 0d0a 7c43

I can remove them with awk, but am unable to do the same with sed.

This works in awk, removing the line breaks completely:

awk 'gsub(/\r/,""){printf $0;next}{print}'

But this in sed does not, leaving line feeds in place:

sed -i 's/\r//g'

where this appears to have no effect:

sed -i 's/\r\n//g'

Using ^M in the sed expression (ctrl+v, ctrl+m) also does not seem to work.

For this sort of task, sed is easier to grok, but I am working on learning more about both. Am I using sed improperly, or is there a limitation?

kermatt
  • 1,585
  • 2
  • 16
  • 36

5 Answers5

68

You can use the command line tool dos2unix

dos2unix input

Or use the tr command:

tr -d '\r' <input >output

Actually, you can do the file-format switching in vim:

Method A:
:e ++ff=dos
:w ++ff=unix
:e!
Method B:
:e ++ff=dos
:set ff=unix
:w

EDIT

If you want to delete the \r\n sequences in the file, try these commands in vim:

:e ++ff=unix           " <-- make sure open with UNIX format
:%s/\r\n//g            " <-- remove all \r\n
:w                     " <-- save file

Your awk solution works fine. Another two sed solutions:

sed '1h;1!H;$!d;${g;s/\r\n//g}' input
sed ':A;/\r$/{N;bA};s/\r\n//g' input
kev
  • 155,172
  • 47
  • 273
  • 272
  • 1
    dos2unix leaves linefeeds (\n) in place. I need to remove them completely. tr only removes the \r, leaving the same result. – kermatt Jul 27 '12 at 03:16
  • tr -d '[\r\n]' turns the file in one giant line. It appears to remove the characters individually. – kermatt Jul 27 '12 at 03:36
  • @MattK Why `dos2unix` doesn't work? Can you post your sample input/output file? – kev Jul 27 '12 at 03:53
  • dos2unix appears to replace \r\n with \n. I need to delete the \r\n patterns, as the file already has Unix line endings, and the Windows pairs are garbage data within the lines. – kermatt Jul 27 '12 at 03:56
  • 2012-07-26|123456||UserName1|0|2004-03-31 00:00:00.000|N||1|0000000002932f3d|San Diego|CA|United States of America|992192-2986^M 3 ^M 4 |CREDIT|2004-03-31 00:00:00.000|2004-03-31 00:00:00.000|31|N|N|N||Y|||0||||0|N|1|1|Y||||||||1||1|||N|2004-05-31 18:21:42.403|DEFAULT|||||||||||N||||Y|||||||| – kermatt Jul 27 '12 at 04:00
  • 1
    In vim, or even in plain-old **vi**, you can also remove Ctrl-M's at the ends of lines by typing `:%s/^V^M//`. The Ctrl-V causes the Ctrl-M to be escaped, so that you can include it in the expression. I do this in FreeBSD and OSX `vi` all the time. – ghoti Jul 27 '12 at 04:26
  • It does indeed work as expected in Vim. This work is a regular cron job,and IIRC I can pass the ex command through Vim in a shell script? – kermatt Jul 27 '12 at 04:38
25

I believe some versions of sed will not recognize \r as a character. However, you can use a bash feature to work around that limitation:

echo $string | sed $'s/\r//'

Here, you let bash replace '\r' with the actual carriage return character inside the $'...' construct before passing that to sed as its command. (Assuming you use bash; other shells should have a similar construct.)

chepner
  • 497,756
  • 71
  • 530
  • 681
  • This appears to be the case. But I have large text groups to process, ~100MB files. Finding other examples of workarounds in bash. Looking for the one that will work in this situation. – kermatt Jul 27 '12 at 03:37
  • This seems to the be right path, but in the end, awk appears the be the answer. Its syntax is more complicated, but the regexes I give work as expected (same as in Vim). – kermatt Jul 27 '12 at 21:03
10

sed -e 's/\r//g' input_file

This works for me. The difference of -e instead of -i command.

Also I mentioned that see on different platforms behave differently. Mine is:sed --version This is not GNU sed version 4.0

Sergiy Dolnyy
  • 101
  • 1
  • 3
7

Another method

awk 1 RS='\r\n' ORS=
  • set Record Separator to \r\n
  • set Output Record Separator to empty string
  • 1 is always true, and in the absence of an action block {print} is used
Zombo
  • 1
  • 62
  • 391
  • 407
0

I had the whole file appears as one line with "^M" symbols instead of new lines. The only solution that worked for me was inside vi type this command (don't copy & paste)

:%s/\r/\r/g

then save and exit using 'ZZ'

This command tells Vim to replace each carriage return character (\r, which appears as ^M) with a newline character. The % tells Vim to apply the command to every line in the file.

Amir Uval
  • 14,425
  • 4
  • 50
  • 74