1

I have a malformed CSV file with many lines similar to:

a;b;c;d;e;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;^M

I am struggling to find the right regular expression to use in my Vi editor to eliminate the multiple contiguous semicolons (there are many more on each row) and the DOS ^M and obtain just the clean data such as

a;b;c;d;e;
Robert Alexander
  • 875
  • 9
  • 24

1 Answers1

2

First, you need to remove the trailing semi-colons with

:%s/;\+$//g

Then, run this to convert line breaks to LF:

::set ff=unix

And save the file:

:w
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks Wiktor but in my MacVIM editor your substitution string just hangs there and does not seem to change anything. Might be an implementation problem in this version of vi ?? Wonder if using awk or sed could be a better gamble. – Robert Alexander Nov 24 '21 at 08:17
  • @RobertAlexander It is so much easier with `sed`: `sed -E 's/;+\r?$//g' file > newfile`. To modify the file contents: ``sed -i '' -E 's/;+\r?$//g' file`` – Wiktor Stribiżew Nov 24 '21 at 08:21
  • Thank you so much. I am having problems: (base) bob@Roberts-Mac-mini opendata % sed -E 's/;+\r?$//g' listaC.csv > test.csv sed: RE error: illegal byte sequence so it seems that the char encoding is not UTF-8 but go guess what :( – Robert Alexander Nov 24 '21 at 08:46
  • @RobertAlexander See [this thread](https://stackoverflow.com/questions/19242275). Maybe adding `LC_CTYPE=C` before `sed` will work for you. Else, I think you can get much better user experience with a GNU `sed`. See [How to install gnu sed on Mac OS X and set it as the default](https://gist.github.com/andre3k1/e3a1a7133fded5de5a9ee99c87c6fa0d). – Wiktor Stribiżew Nov 24 '21 at 08:51
  • 1
    Thanks Wiktor, will try. Have a nice day. – Robert Alexander Nov 24 '21 at 08:53
  • 1
    UPDATE: yes export LC_TYPE=C before sed worked. Thanks – Robert Alexander Nov 24 '21 at 15:15