2

I have a CSV file and want to generate dumps of the data for sqlite, mysql, postgres, oracle, and mssql.

Is there a common API (ideally Python based) to do this?

I could use an ORM to insert the data into each database and then export dumps, however that would require installing each database. It also seems a waste of resources - these CSV files are BIG.

I am wary of trying to craft the SQL myself because of the variations with each database. Ideally someone has already done this hard work, but I haven't found it yet.

hoju
  • 28,392
  • 37
  • 134
  • 178

3 Answers3

5

SQLAlchemy is a database library that (as well as ORM functionality) supports SQL generation in the dialects of the all the different databases you mention (and more).

In normal use, you could create a SQL expression / instruction (using a schema.Table object), create a database engine, and then bind the instruction to the engine, to generate the SQL.

However, the engine is not strictly necessary; the dialects each have a compiler that can generate the SQL without a connection; the only caveat being that you need to stop it from generating bind parameters as it does by default:

from sqlalchemy.sql import expression, compiler
from sqlalchemy import schema, types
import csv

# example for mssql
from sqlalchemy.dialects.mssql import base
dialect = base.dialect()
compiler_cls = dialect.statement_compiler
class NonBindingSQLCompiler(compiler_cls):
    def _create_crud_bind_param(self, col, value, required=False):
        # Don't do what we're called; return a literal value rather than binding
        return self.render_literal_value(value, col.type)

recipe_table = schema.Table("recipe", schema.MetaData(), schema.Column("name", types.String(50), primary_key=True), schema.Column("culture", types.String(50)))

for row in [{"name": "fudge", "culture": "america"}]: # csv.DictReader(open("x.csv", "r")):
    insert = expression.insert(recipe_table, row, inline=True)
    c = NonBindingSQLCompiler(dialect, insert)
    c.compile()
    sql = str(c)
    print sql

The above example actually works; it assumes you know the target database table schema; it should be easily adaptable to import from a CSV and generate for multiple target database dialects.

David Fraser
  • 6,475
  • 1
  • 40
  • 56
  • Nice design of the solution (+1) although it looks like it will take forever to process a **BIG** (to use the OP words) cvs table... – mac Nov 25 '10 at 13:56
  • It would be faster to just run sqlalchemy against the actual database; but he wanted SQL. I don't actually think it would be ridiculously slow though; obviously if you knew the data types etc you could create a much faster recipe but this has the advantage of using an existing library – David Fraser Nov 25 '10 at 17:56
  • See http://stackoverflow.com/a/36141722/120398 - apparently you can more simply do this by saying `q.statement.compile(compile_kwargs={"literal_binds": True})` – David Fraser May 02 '17 at 17:07
1

I am no database wizard, but AFAIK in Python there's not a common API that would do out-of-the-box what you ask for. There is PEP 249 that defines an API that should be used by modules accessing DB's and that AFAIK is used at least by the MySQL and Postgre python modules (here and here) and that perhaps could be a starting point.

The road I would attempt to follow myself - however - would be another one:

  1. Import the CVS nto MySQL (this is just because MySQL is the one I know best and there are tons of material on the net, as for example this very easy recipe, but you could do the same procedure starting from another database).
  2. Generate the MySQL dump.
  3. Process the MySQL dump file in order to modify it to meet SQLite (and others) syntax.

The scripts for processing the dump file could be very compact, although they might somehow be tricky if you use regex for parsing the lines. Here's an example script MySQL → SQLite that I simply pasted from this page:

#!/bin/sh 
mysqldump --compact --compatible=ansi --default-character-set=binary mydbname | 
grep -v ' KEY "' | 
grep -v ' UNIQUE KEY "' | 
perl -e 'local $/;$_=<>;s/,\n\)/\n\)/gs;print "begin;\n";print;print "commit;\n"' | 
perl -pe ' 
if (/^(INSERT.+?)\(/) { 
$a=$1; 
s/\\'\''/'\'\''/g; 
s/\\n/\n/g; 
s/\),\(/\);\n$a\(/g; 
} 
' | 
sqlite3 output.db

You could write your script in python (in which case you should have a look to re.compile for performance).

The rationale behind my choice would be:

  1. I get the heavy-lifting [importing and therefore data consistency checks + generating starting SQL file] done for me by mysql
  2. I only have to have one database installed.
  3. I have full control on what is happening and the possibility to fine-tune the process.
  4. I can structure my script in such a way that it will be very easy to extend it for other databases (basically I would structure it like a parser that recognises individual fields + a set of grammars - one for each database - that I can select via command-line option)
  5. There is much more documentation on the differences between SQL flavours than on single DB import/export libraries.

EDIT: A template-based approach

If for any reason you don't feel confident enough to write the SQL yourself, you could use a sort of template-based script. Here's how I would do it:

  1. Import and generate a dump of the table in all the 4 DB you are planning to use.
  2. For each DB save the initial part of the dump (with the schema declaration and all the rest) and a single insert instruction.
  3. Write a python script that - for each DB export - will output the "header" of the dump plus the same "saved line" into which you will programmatically replace the values for each line in your CVS file.

The obvious drawback of this approach is that your "template" will only work for one table. The strongest point of it is that writing such script would be extremely easy and quick.

HTH at least a bit!

mac
  • 42,153
  • 26
  • 121
  • 131
  • step 3 is what I am concerned about. If there anything available that creates the insert statements for each database? – hoju Nov 23 '10 at 04:51
  • There are plenty of commercial utility that can do the conversion automatically between some of the most common DB's (they are priced between 40 and 900 US$). However these utilities normally deal with the full range of DB bits and bolts (indexes, foreign keys, etc...). Also, I am not sure if those utilities are scriptable or if they only work via a GUI. – mac Nov 23 '10 at 15:13
  • Sorry, got interreupted! I am unaware of FLOSS alternatives to the above mentioned utilities (ex: http://tinyurl.com/2wyszoq). Honestly, to just import a table I would rather look into the differences between various SQL dialects myself: they are normally pretty trivial when it comes to define the schema and write the insert statements. In case of doubts, you could create an initial import and dump for each of the DB's you are planning to use, and make sure that your "transforming code" will generate a file identical to that generated by the target DB itself. – mac Nov 23 '10 at 15:25
  • @Plumo - See also my edit at the bottom of the original answer. – mac Nov 23 '10 at 15:37
  • this would then require installing each database, which I am trying to avoid. I agree the SQL dialects are not complex, but it will require testing (=time) to get it right. I am surprised no one has done this work already. – hoju Nov 24 '10 at 03:24
0

You could do this - Create SQL tables from CSV files

or Generate Insert Statements from CSV file

or try this Generate .sql from .csv python

Of course you might need to tweak the scripts mentioned to suite your needs.

Community
  • 1
  • 1
Srikar Appalaraju
  • 71,928
  • 54
  • 216
  • 264
  • but there are slight differences between each database format – hoju Nov 08 '10 at 11:07
  • then I suppose you need to handle each separately, but those scripts should help you get started – Srikar Appalaraju Nov 08 '10 at 11:13
  • I was not the downvoter, but my guess is that whoever did it, did it because your answer is nothing more than google hits [legitimate answer and thus not deserving a down vote if you ask me, yet a bit sloppy] ;) – mac Nov 22 '10 at 00:18