1

I am seeking a way in bash only for linux & posix environments (no gawk, no python, no libraries or language dependencies beyond default cli tools across linux/posix) method for reading a multi-line csv file into variables one line at a time for processing. The CSV values have commas inside double quotes which is screwing up the existing code:

while IFS=, read -r field1 field2 field3 field4 field5 field6 field7 field8 field9 field10 field11 field12 field13 field14 field15 field18 field17 field18 field19 field29 
do

Sample CSV input from a single line:

"AC XA, S.A.","City of Commerce","00","0A348E541E6F5C258A12A5674AEF25F28BA7DCFAECEECC4EE63B71B361606AC3","Included","Included","Included","Not Included","09/30/2003","09/30/2037","","SZ=City of Commerce; FE=http://website.org; O=AC XA SA CIF A39201827; C=QR","--BINARY BLOB--maXJtYSBTQSBDSUYgQTgyNzQzMjg3MSMwIQYDVQQLExpodHRwOi8vd3d3LmNoYW1iZXJzaWduLm9yZzEiMCAGA1UEAxMZQ2hhbWJlcnMgb2YgQ29tbWVyY2UgUm9vdDAeFw0wMzA5MzAUIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MB4XDTE2MDQxNDA3NTAwNloXDTQwMDQwODA3NTAwNlowggEIMQswCQYDVQQGEwJFUzEPMA0GA1UECAwGTUFEUklEMQ8wDQYDVQQHDAZNQURSSUQxOjA4BgNVBAsMMXNlZSBjdXJyZW50IGFkZHJlc3MgYXQgd3d3LmNhbWVyZmlybWEuY29tL2FkZHJlc3MxJzAlBgNVBAsMHkdMT0JBTCBDSEFNQkVSU0lHTiBST09UIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MIICIjANBgkqhkiG9w0BAQEF--END--","SZ1qSRW","Email","AFX Client;Email","AFX Method;AFX Method;Entry Point;AFX Email;Time Stamping","No Entry","",""

The desired output would be reading each of the following value strings per line into variables:

"AC XA, S.A."
"City of Commerce"
"00"
"0A348E541E6F5C258A12A5674AEF25F28BA7DCFAECEECC4EE63B71B361606AC3"
"Included"
"Included"
"Included"
"Not Included"
"09/30/2003"
"09/30/2037"
""
"SZ=City of Commerce; FE=http://website.org; O=AC XA SA CIF A39201827; C=QR"
"--BINARY BLOB--maXJtYSBTQSBDSUYgQTgyNzQzMjg3MSMwIQYDVQQLExpodHRwOi8vd3d3LmNoYW1iZXJzaWduLm9yZzEiMCAGA1UEAxMZQ2hhbWJlcnMgb2YgQ29tbWVyY2UgUm9vdDAeFw0wMzA5MzAUIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MB4XDTE2MDQxNDA3NTAwNloXDTQwMDQwODA3NTAwNlowggEIMQswCQYDVQQGEwJFUzEPMA0GA1UECAwGTUFEUklEMQ8wDQYDVQQHDAZNQURSSUQxOjA4BgNVBAsMMXNlZSBjdXJyZW50IGFkZHJlc3MgYXQgd3d3LmNhbWVyZmlybWEuY29tL2FkZHJlc3MxJzAlBgNVBAsMHkdMT0JBTCBDSEFNQkVSU0lHTiBST09UIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MIICIjANBgkqhkiG9w0BAQEF--END--"
"SZ1qSRW"
"Email"
"AFX Client;Email"
"AFX Method;AFX Method;Entry Point;AFX Email;Time Stamping"
"No Entry"
""
""

Currently the IFS method I have breaks because of the comma within the first value "AC XA, S.A.". I've tried the following awk with some success but it isn't clear if this can be used in such a way to read each of the items into variables and specify the line number:

awk -F"," '{for(i=1;i<=NF;i++){if(i%NF==0){ORS="\n"} {if($i ~ /^\"/ || $i ~ /\"$/) {a=a OFS $i;j++;{if(j%2==0){sub(/^\,/,X,a); print a;j=0;a=""}}} else {print $i}}} {ORS=","}}' ORS="," OFS=, file.csv

user3665852
  • 57
  • 1
  • 5
  • Please [edit] your question to clarify do you mean by `and specify the line number`? There's no line number in your desired output. – Ed Morton Apr 11 '23 at 14:49

2 Answers2

3

Don't read the fields into 20 different scalar variables named field1, field2, etc., read them into an array. Using bash and an awk than can use NUL as the ORS, e.g. GNU awk but not only GNU awk so whatever awk you're already using might work:

$ cat tst.sh
#!/usr/bin/env bash

while IFS=$'\n' read -d '' -r -a fields; do
    printf '%s\n' "${fields[@]}"
done < <( awk -v ORS='\0' '{gsub(/","/,"\"\n\"")} 1' "${@:--}" )

$ ./tst.sh file
"AC XA, S.A."
"City of Commerce"
"00"
"0A348E541E6F5C258A12A5674AEF25F28BA7DCFAECEECC4EE63B71B361606AC3"
"Included"
"Included"
"Included"
"Not Included"
"09/30/2003"
"09/30/2037"
""
"SZ=City of Commerce; FE=http://website.org; O=AC XA SA CIF A39201827; C=QR"
"--BINARY BLOB--maXJtYSBTQSBDSUYgQTgyNzQzMjg3MSMwIQYDVQQLExpodHRwOi8vd3d3LmNoYW1iZXJzaWduLm9yZzEiMCAGA1UEAxMZQ2hhbWJlcnMgb2YgQ29tbWVyY2UgUm9vdDAeFw0wMzA5MzAUIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MB4XDTE2MDQxNDA3NTAwNloXDTQwMDQwODA3NTAwNlowggEIMQswCQYDVQQGEwJFUzEPMA0GA1UECAwGTUFEUklEMQ8wDQYDVQQHDAZNQURSSUQxOjA4BgNVBAsMMXNlZSBjdXJyZW50IGFkZHJlc3MgYXQgd3d3LmNhbWVyZmlybWEuY29tL2FkZHJlc3MxJzAlBgNVBAsMHkdMT0JBTCBDSEFNQkVSU0lHTiBST09UIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MIICIjANBgkqhkiG9w0BAQEF--END--"
"SZ1qSRW"
"Email"
"AFX Client;Email"
"AFX Method;AFX Method;Entry Point;AFX Email;Time Stamping"
"No Entry"
""
""

and if your awk can't use NULs then using any awk:

$ cat tst.sh
#!/usr/bin/env bash

while IFS= read -r field; do
    if [[ -z "$field" ]]; then
        printf '%s\n' "${fields[@]}"
        fields=()
    else
        fields+=( "$field" )
    fi
done < <( awk -v ORS='\n\n' '{gsub(/","/,"\"\n\"")} 1' "${@:--}" )

$ ./tst.sh file
"AC XA, S.A."
"City of Commerce"
"00"
"0A348E541E6F5C258A12A5674AEF25F28BA7DCFAECEECC4EE63B71B361606AC3"
"Included"
"Included"
"Included"
"Not Included"
"09/30/2003"
"09/30/2037"
""
"SZ=City of Commerce; FE=http://website.org; O=AC XA SA CIF A39201827; C=QR"
"--BINARY BLOB--maXJtYSBTQSBDSUYgQTgyNzQzMjg3MSMwIQYDVQQLExpodHRwOi8vd3d3LmNoYW1iZXJzaWduLm9yZzEiMCAGA1UEAxMZQ2hhbWJlcnMgb2YgQ29tbWVyY2UgUm9vdDAeFw0wMzA5MzAUIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MB4XDTE2MDQxNDA3NTAwNloXDTQwMDQwODA3NTAwNlowggEIMQswCQYDVQQGEwJFUzEPMA0GA1UECAwGTUFEUklEMQ8wDQYDVQQHDAZNQURSSUQxOjA4BgNVBAsMMXNlZSBjdXJyZW50IGFkZHJlc3MgYXQgd3d3LmNhbWVyZmlybWEuY29tL2FkZHJlc3MxJzAlBgNVBAsMHkdMT0JBTCBDSEFNQkVSU0lHTiBST09UIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MIICIjANBgkqhkiG9w0BAQEF--END--"
"SZ1qSRW"
"Email"
"AFX Client;Email"
"AFX Method;AFX Method;Entry Point;AFX Email;Time Stamping"
"No Entry"
""
""

It'll work as long as all of your fields are quoted and none of them contain the string "," or newlines.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

The easiest way to do what you are proposing is to use csv parser.

A common parser is with Ruby:

ruby -r csv -e 'CSV.parse($<.read).
                    flatten.
                    each{|field| 
                        puts [field].to_csv(force_quotes: true)
                    }' file 

With your example:

"AC XA, S.A."
"City of Commerce"
"00"
"0A348E541E6F5C258A12A5674AEF25F28BA7DCFAECEECC4EE63B71B361606AC3"
"Included"
"Included"
"Included"
"Not Included"
"09/30/2003"
"09/30/2037"
""
"SZ=City of Commerce; FE=http://website.org; O=AC XA SA CIF A39201827; C=QR"
"--BINARY BLOB--maXJtYSBTQSBDSUYgQTgyNzQzMjg3MSMwIQYDVQQLExpodHRwOi8vd3d3LmNoYW1iZXJzaWduLm9yZzEiMCAGA1UEAxMZQ2hhbWJlcnMgb2YgQ29tbWVyY2UgUm9vdDAeFw0wMzA5MzAUIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MB4XDTE2MDQxNDA3NTAwNloXDTQwMDQwODA3NTAwNlowggEIMQswCQYDVQQGEwJFUzEPMA0GA1UECAwGTUFEUklEMQ8wDQYDVQQHDAZNQURSSUQxOjA4BgNVBAsMMXNlZSBjdXJyZW50IGFkZHJlc3MgYXQgd3d3LmNhbWVyZmlybWEuY29tL2FkZHJlc3MxJzAlBgNVBAsMHkdMT0JBTCBDSEFNQkVSU0lHTiBST09UIC0gMjAxNjESMBAGA1UEBRMJQTgyNzQzMjg3MRgwFgYDVQRhDA9WQVRFUy1BODI3NDMyODcxGzAZBgNVBAoMEkFDIENBTUVSRklSTUEgUy5BLjEnMCUGA1UEAwweR0xPQkFMIENIQU1CRVJTSUdOIFJPT1QgLSAyMDE2MIICIjANBgkqhkiG9w0BAQEF--END--"
"SZ1qSRW"
"Email"
"AFX Client;Email"
"AFX Method;AFX Method;Entry Point;AFX Email;Time Stamping"
"No Entry"
""
""
dawg
  • 98,345
  • 23
  • 131
  • 206