I have big_file.csv containing a bunch of company information. Here's a snippet
CompanyName, CompanyNumber,RegAddress.CareOf,...
"! # 1 AVAILABLE LOCKSMITH LTD","05905727","",...
"!NSPIRED LIMITED","06019953",""...
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD","07981734",""...
I only need the CompanyName and CompanyNumber fields, so I did the following:
cut -d, -f 1,2 big_file.csv > big_file_names_codes_only.csv
As you can see tho (and I understand why) the third entry in the big_file.csv gets cut after the first comma which is actually part of CompanyName. I know how to remove in sed the first comma (but that would break the whole csv strucutre), so i was wondering if any of you knew how to remove the comma from the first (it's always on position 1) "string, with, commas, or not and non alphanum chars!"
.
So basically the intermediate output i am looking for is:
CompanyName, CompanyNumber
"! # 1 AVAILABLE LOCKSMITH LTD","05905727"
"!NSPIRED LIMITED","06019953"
"CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD","07981734"
But this last line becomes:
"CENTRE FOR COUNSELLING, PSYCHOTHERAPY AND TRAINING LTD"
Once I get this intermediate output I need to clean the company of all non alpha num characters in the name and leading spaces - which works very well with this:
sed -i 's/[^a-zA-Z0-9 ,]//g; s/^[ \t]*//'
In the end my file should be:
CompanyName, CompanyNumber,RegAddress.CareOf,...
AVAILABLE LOCKSMITH LTD,05905727
NSPIRED LIMITED,06019953
CENTRE FOR COUNSELLING PSYCHOTHERAPY AND TRAINING LTD,07981734