10

I want to generate sql insert statements for some csv file.

I can do this using a one-liner awk script such as:

awk -F "\t" '{printf("INSERT INTO T_COMMON_ENUM_VALUE (id,name,category_id) values (%s, '\''%s'\'', %s, %s);\n", $1, $2, $3, $4)}'

But this still requires some effort. csvsql inside csvkit seems to generate insert statements automatically. I checked the documentation and used the following command but it doesn't generate insert statements.

$ cat data02.csv
db_enumvalue_id db_enumvalue_name       db_enumcategory_id
800     şirin   9

$ csvsql data02.csv
CREATE TABLE data02 (
        db_enumvalue_id INTEGER NOT NULL,
        db_enumvalue_name VARCHAR(18) NOT NULL,
        db_enumcategory_id INTEGER NOT NULL
);

It generates the create table statements. But the documentation says:

Generate SQL statements for a CSV file or execute those statements directly on a database.

What should I do to get the insert sql statements using csvkit?

Mert Nuhoglu
  • 9,695
  • 16
  • 79
  • 117
  • the documentation says to specify the database and add a --insert argument like so: --insert --db postgresql:///test --insert examples/*.csv However I find the behavior is equivalent to the above, where the table is created but the rows are not inserted. – daj Jul 05 '16 at 22:10
  • 1
    Looking at the [source](https://github.com/wireservice/csvkit/blob/master/csvkit/utilities/csvsql.py), it will only directly execute insert statements, not output them. – Blorgbeard Jul 05 '16 at 22:31
  • @Blorgbeard nonetheless, it doesn't seem to execute them. When I look at the database contents the table is there but there are no rows (even though there are in the csv). – daj Jul 05 '16 at 23:53

4 Answers4

5

Here is an entirely data-driven way. Slightly goofy, but it works.

#!/usr/bin/env bash

##
## ensure script stops on errors
set -eu
set -o pipefail

##
## load your data into a SQLite DB
csvsql test.csv --db=sqlite:///test.db --insert

##
## let SQLite generate the inserts
echo ".dump test" | sqlite3 test.db

Run that and you get something like:

BEGIN TRANSACTION;
CREATE TABLE test (
    id INTEGER NOT NULL,
    month VARCHAR(5) NOT NULL,
    market FLOAT NOT NULL,
    acme FLOAT NOT NULL
);
INSERT INTO "test" VALUES(1,'1/86',-0.061134,0.03016);
INSERT INTO "test" VALUES(2,'2/86',0.00822,-0.165457);
INSERT INTO "test" VALUES(3,'3/86',-0.007381,0.080137);
...
INSERT INTO "test" VALUES(60,'12/90',-0.026401,-0.190834);
COMMIT;

Capture that in a file and you're golden. Could also be wrapped in a Python subprocess call, or piped into a database command line client, if you want to automate the insert phase as well.

If you're inserting to an existing table, this approach works for a DBMS like MySQL that does implicit type conversions, since CSV can't define data types for fields past strings vs. numbers. For a more strict DBMS like Postgres, you would need to edit the data types in the script file.

Chris Johnson
  • 20,650
  • 6
  • 81
  • 80
  • Hitting a problem when I round-trip through sqlite3, getting 0.29999999999999998889 instead of 0.3 (lack of DECIMAL type, binary floating point issues introduced when value round-trip through sqlite3). – yzorg Dec 20 '22 at 16:53
1

looking at documentation, I guess someting like that:

if destination table T_COMMON_ENUM_VALUE already exists:

$ csvsql --tables T_COMMON_ENUM_VALUE --insert --no-create data02.csv

if destination table T_COMMON_ENUM_VALUE does not exists:

$ csvsql --tables T_COMMON_ENUM_VALUE --insert data02.csv

this should only output the script, if you want the statements executed to your db, you need to add --db CONNECTION_STRING, for example:

$ csvsql --db mssql://user:pass@host:port/database --tables T_COMMON_ENUM_VALUE --insert data02.csv

I hope this helps

MtwStark
  • 3,866
  • 1
  • 18
  • 32
0

There is probably no straightforward solution to generate insert or delete SQL script using csvsql; at least I could not find one myself when I needed it desperately. However, the --query option comes to our rescue by allowing us to prepare such statements ourselves. Of course, it needs a little bit of effort, but it pays off very well.

Here is an example. I have a csv file (called test2.csv). This is how I am able to generate insert statements for all rows in the csv file.

csvsql  --query "SELECT 'insert into test2  values (''' ||    
trim(ifnull(my_table.survived, 'null123')) ||''' ,''' ||     
trim(ifnull(my_table.RECORD_TYPE, 'null123')) ||''' ,''' ||     
trim(ifnull(my_table.BASE_HIN, 'null123')) ||''' ,''' ||     
trim(ifnull(my_table.SUFFIX, 'null123')) ||''',''' ||     
trim(ifnull(my_table.name, 'null123')) ||''' ,''' ||     
trim(ifnull(my_table.ADDRESS_1, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.ADDRESS_2, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.CITY, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.STATE, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.PRIMARY_ZIP, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.COUNTRY, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.TELEPHONE, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.CLASS_OF_TRADE, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.DEA, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.DATE_HIN_ASSIGNED, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.DATE_LAST_EDITED, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.STATUS, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.VERIFIED, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.LAST_ACTION_CODE, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.REASON_CODE, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.REFERBACK_CODE, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.SUBSET, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.UPIN, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.SPECIALTY, 'null123')) ||'''  ,''' ||     
trim(ifnull(my_table.FAX, 'null123')) ||''' )'FROM test2 as my_table WHERE survived='0'" test2.csv > insert.sql

Make sure that this is one single line because csvsql does not like broken lines.

Also, once the insert.sql is prepared, you need to remove the header line, remove double quotes (from all the rows) and also find and replace = 'null123' with is null.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
0

https://github.com/Ahmed-M-Salah/CsvToSql

A C# project. Updated fork of a prior project 5-7 years ago.

yzorg
  • 4,224
  • 3
  • 39
  • 57