0

I have a csv file into which has crept some ^M dos line ends, and I want to get rid of them, as well as 16 spaces and 3 tabs which follow. Like, I have to merge that line with the next one down. Heres an offending record and a good one as a sample of what I mean:

"Mary had a ^M
                  little lamb", "Nursery Rhyme", 1878
"Mary, Mary quite contrary", "Nursery Rhyme", 1838

I can remove the ^M using sed as you can see, but I cannot work out how to rm the nix line end to join the lines back up.

sed -e "s/^M$             //g" rhymes.csv > rhymes.csv

UPDATE

Then I read "However, the Microsoft CSV format allows embedded newlines within a double-quoted field. If embedded newlines within fields are a possibility for your data, you should consider using something other than sed to work with the data file." from: http://sed.sourceforge.net/sedfaq4.html

So editing my question to ask Which tool I should be using?

Brad
  • 15,186
  • 11
  • 60
  • 74
Cups
  • 6,901
  • 3
  • 26
  • 30

2 Answers2

2

With help from How can I replace a newline (\n) using sed?, I made this one:

sed -e ':a;N;$!ba;s/\r\n                \t\t\t/=/' -i rhymes.csv

<CR> <LF> <16 spaces> <3 tabs>

If you just want to delete the CR, you could use:

<yourfile tr -d "\r" | tee yourfile

(or if the two input and output file are different: <yourfile tr -d "\r" > output)

Community
  • 1
  • 1
Lekensteyn
  • 64,486
  • 22
  • 159
  • 192
  • I tried that and it does not work for me, the closest I get is with ctrl-v ctrl-m to generate ^M – Cups Aug 23 '10 at 17:52
  • tr is neat but it did not join the lines. The sed solution worked - I can then go on and use tr to rm any consecutive spaces throughout the file, thanks a lot. – Cups Aug 24 '10 at 15:41
  • tr translate (or delete) characters. If you wanted to delete the LF too, you would use `cat yourfile | tr -d "\n\r" | tee yourfile` – Lekensteyn Aug 24 '10 at 15:57
  • `tr -d "\r" – Isaac Apr 19 '13 at 07:41
2
dos2unix  file_name

to convert file, or

dos2unix old_file new_file

to create new file.

Chance
  • 2,653
  • 2
  • 26
  • 33
  • Thanks. It left me with the problem of re-identifying and removing this line end in the middle of a record though. – Cups Aug 24 '10 at 15:42