0

I have a .csv file with comma as delimiter, it has 600 fields, When i try to open the file in excel field 531 has a value, But when i try to extract the data from linux using cut and awk command, It is not giving exact data any suggestion

File format : 92b-a727-6fbc59a453a6"","","","be41bbe6-f813-4","","","",""

When open in Excell field 531 shows as :

25e9417a-bc84-4a32-bc42-95ca70dce112    
124e8d11-3326-41f1-9b1a-7258332bd493    
f2c98d41-daa5-423f-82ee-787e4f64dfe8    
be41bbe6-f813-492b-a727-6fbc59a453a6    
a1ef5423-93d7-4cf4-ba37-2eb2cb4a7611    
d12116e4-3427-4139-8d7c-41947e8534cc

When try using the command "cut -d "," -f531" or "awk -F',' '{print $531}'

""
united states"
""
"be41bbe6-f813-492b-a727-6fbc59a453a6"
"a1ef5423-93d7-4cf4-ba37-2eb2cb4a7611"
"d12116e4-3427-4139-8d7c-41947e8534cc"

Please suggest.

I even replace "," to | and try to extract using cut and awk still same result

Thanh Nguyen Van
  • 10,292
  • 6
  • 35
  • 53
SGS
  • 27
  • 6
  • 1
    Probably you have embedded commas in some of your fields. If you run `awk -F, '{print NF}' file | sort -u` and see output of numbers bigger than 600, that's what the problem is. – jas Apr 27 '18 at 03:57
  • If you have gnu awk you can try the `FPAT` solution here: https://stackoverflow.com/a/17287068/2229272 – jas Apr 27 '18 at 04:04
  • 2
    if you did a global substitute w sed/awk, then you've also changed the "embedded" `,`s like in `"IL, United States"`. You need to export from Excel using `|` as the field separator, and then your code should work. ++ for @jas's comment. Good luck. – shellter Apr 27 '18 at 04:31
  • The file which I am trying to access has millions of record minimum file size is 25gb – SGS Apr 27 '18 at 04:32

0 Answers0