0

In this post, I had asked a question as to how I can have my script read a CSV header and auto-quote string values; i.e. automatically wrap those columns whose data type is string and would need "" in order to be inserted into a SQL table. Apparently, this would be more than what Bash can handle?

At any rate, I'd welcome any help regarding how I can make the following script work: here, I have basically the same script but I tried to manually define separate columns and assign them to variables with col3 being a string column, hence quoted. Needless to say, it doesn't do what I want it to do (i.e. give me quoted values for all the string data under col3). Thanks!

#!/bin/bash

echo Path to to-be-imported CSV:
read csv_file
echo Table name to import into:
read table

echo "INSERT INTO $table VALUES" > SQL_INSERT_$table.txt
while read col1 col2 col3 col4
do
 echo "($col1 $col2 "$col3" $col4),"
done < <(tail -n +2 $csv_file) >> SQL_INSERT_$table.txt && sed -i '' '$ s/.$/;/' SQL_INSERT_$table.txt
Kay Gee
  • 97
  • 8
  • Does this answer your question? [How to auto-quote "string values" of a SQL INSERT STATEMENT in Bash script](https://stackoverflow.com/questions/66716484/how-to-auto-quote-string-values-of-a-sql-insert-statement-in-bash-script) – Mr R Mar 21 '21 at 23:11
  • What the output (an example) you expect? Perhaps you could use `printf "(%q, %q, %q, %q)" "$col1" "$col2" "$col3" "$col4")` instead of quoting yourself? – NoDataFound Mar 21 '21 at 23:11
  • Thanks @NoDataFound. But I still need to determine in advance which columns I want quoted or not, right? My original question was if there were a way to identify string columns and auto-quote them (on the part of the script) so that I can just feed the whole header to a single variable. – Kay Gee Mar 21 '21 at 23:22
  • @KayGee which DB are you using ? Perhaps these answers will help - https://stackoverflow.com/questions/13405572/sql-statement-to-get-column-type – Mr R Mar 21 '21 at 23:26
  • Does this answer your question? [How do I use variables in single quoted strings?](https://stackoverflow.com/questions/21192420/how-do-i-use-variables-in-single-quoted-strings) – June7 Mar 21 '21 at 23:41
  • Can you give some sample data as in csv_file? – alex Mar 22 '21 at 00:32
  • @June7 I want to have $col3 expand and get quoted in the output at the same time (the column in CSV comes w/o quotes). – Kay Gee Mar 22 '21 at 01:09
  • @alex: e.g. ```$1: col1,col2,col3,col4 $2: 5,6,c,7 $3: 8,9,x,10 $4: 8,7,e,5``` – Kay Gee Mar 22 '21 at 01:10
  • @KayGee, my answer is based on the pattern detection, not the column position. It should be fine if there's no empty value for the col3. – alex Mar 22 '21 at 01:20
  • Could maybe write the modified CSV to a new file then import that file. Are you using semi-colon as separator in VALUES clause? Where are the parentheses to enclose that clause? What does the constructed SQL statement look like? Another reference https://stackoverflow.com/questions/17997558/bash-script-to-insert-values-in-mysql – June7 Mar 22 '21 at 01:59
  • @alex I was able to incorporate this into my script, thank you so much! One more question: how can I escape minus numbers? I tweaked the ```/^[0-9.]+$/``` part but to no avail. – Kay Gee Mar 22 '21 at 05:23
  • @KayGee, I've updated the answer to deal with the negative numbers. The regexp was just changed to `/^[0-9.-]+$/`. – alex Mar 22 '21 at 10:34

1 Answers1

1

Given the following csv file:

/tmp/csv:

Year,Make,Model,Description,Price
1997,Ford,E350,moon,-3000.00
1997,Ford,E350,moon,3000.00
1999,Chevy,Venture Extended Edition,,4900.00
1999,Chevy,Venture Extended Edition Very Large,,5000.00

And the code:

awk -F, 'OFS=FS {for (i=1;i<=NF;i++) {if (match($i, /^[0-9.-]+$/)==0) {printf "\"" $i "\""} else {printf $i}; if (i<NF) printf OFS}; printf "\n"}' /tmp/csv

It outputs:

All the string types of values are quoted.

"Year","Make","Model","Description","Price"
1997,"Ford","E350","moon",3000.00
1999,"Chevy","Venture Extended Edition","",4900.00
1999,"Chevy","Venture Extended Edition Very Large","",5000.00

The code is based on these assumptions:

  • There are no literal delimiters(the comma , here) inside the field values.
  • There are no literal newlines inside the field values.
  • There are no null values for the integer or the decimal fields.

If the assumptions don't stand for your data, you may need a CSV parser like this. And the exceptional field values should be well quoted, otherwise, even a parser won't know what to do.

BTW: Unless your data is very simple and the row number is a few, this is not the recommended way to manipulate the data. Many DBMS has an import/load utility that can be used in a program to do such a job.

alex
  • 799
  • 7
  • 8