0

CSV data format

1st Format

name,email,mobile,email
a,a@test.com,1234567890,a@test.com

2nd Format
name,email,"mobile,number",email
a,a@test.com,1234567890,a@test.com

3rd Format
name,email,"mobile number",email
a,a@test.com,1234567890,a@test.com

In my above data format email is duplicate in header so I only want to keep first column value for email and second email data with header should be deleted from file.

I have tied this but it;s not working properly

awk  -F'","' 'NR==1{for(i=1;i<=NF;i++)if(!($i in v)){ v[$i];t[i]}}{s=""; for(i=1;i<=NF;i++)if(i in t)s=s sprintf("%s,",$i);if(s){sub(/,$/,"",s);print s}} ' input.csv > output.csv

Please suggest script command for the same

user13000875
  • 387
  • 2
  • 14

2 Answers2

0

Awk is probably not the most practical tool for manipulating CSV files. There are many others.

Here are a few examples, with your data

  • csvtool (sudo apt install csvtool)
$ for f in [123].csv; do echo $f; csvtool col 1-3 "$f"; echo; done
1.csv
name,email,mobile
a,a@test.com,1234567890

2.csv
name,email,"mobile,number"
a,a@test.com,1234567890

3.csv
name,email,mobile number
a,a@test.com,1234567890
  • csvcut (sudo apt install csvkit)
$ for f in [123].csv; do echo $f; csvcut -C 4 "$f"; echo; done
1.csv
name,email,mobile
a,a@test.com,1234567890

2.csv
name,email,"mobile,number"
a,a@test.com,1234567890

3.csv
name,email,mobile number
a,a@test.com,1234567890
  • Perl's Text::CSV (sudo apt install libtext-csv-perl)
    (This would probably be better suited for more complex needs, and should be in a more readable script file)
$ for f in [123].csv; do echo $f; perl -MText::CSV -lne 'BEGIN{$csv=Text::CSV_XS->new()} if ($csv->parse($_)) {$csv->print(*STDOUT, [ ($csv->fields)[0..2] ]);}' "$f"; echo; done
1.csv
name,email,mobile
a,a@test.com,1234567890

2.csv
name,email,mobile,number
a,a@test.com,1234567890

3.csv
name,email,mobile number
a,a@test.com,1234567890
mivk
  • 13,452
  • 5
  • 76
  • 69
0

If your CSV is well-formed, try

sed 's/^\("\([^"]|""\)*"|\[^",]*\),\("\([^"]|""\)*"|\[^",]*\),\("\([^"]|""\)*"|\[^",]*\),\("\([^"]|""\)*"|\[^",]*\)$/\1,\3,\5/'

Demo: https://ideone.com/7xKlGU

The regex isn't particularly elegant but should work straightforwardly. "\([^"]\|""\)*" matches a quoted field and [^",]* matches a field which isn't quoted. This assumes that either a field in its entirety is quoted, or not at all, and that the escaping mechanism is doubling the double quotes which should be literal, as is the convention in most common CSV dialects.

tripleee
  • 175,061
  • 34
  • 275
  • 318