1

I'm trying to do a bulk insert using python in CrateDB. The command executemany doesn't really perform a bulk insert, in the same way it does with SQL Server using pyodbc. With pyodbc I can use this:

cursor.fast_executemany = True

to solve the problem, as mentioned here. But with the library "crate" for python I don't have this option. Is there a workaround?

1 Answers1

2

unlike pyodbc the cratedb python driver does a real bulk insert without that option. see the example in our documentation https://crate.io/docs/clients/python/en/latest/client.html#inserting-data

  • I compared SQL Server insert speed with Crate's one and the result has been different from my expectations: SQL Server is faster, testing the two DBMS with the same function executemany (with cursor.fast_executemany = True in the SQL Server python script. With this option set to False, SQL Server is much slower). Am I making some mistakes in the configuration of CrateDB? I leaved the .yml configuration file as default and I'm running Crate in localhost. – Davide Mizzaro Mar 08 '18 at 12:35
  • not sure, depends on your setup. how much nodes do you run? on what operating system? – Johannes Moser Mar 08 '18 at 12:48
  • One node on Windows 10 Home – Davide Mizzaro Mar 08 '18 at 13:02
  • cratedb really lifts of, when you use it on several nodes, also the performance on unix based systems is better. you might want to use several connections writing to cratedb and also make sure sharding is optimized. – Johannes Moser Mar 08 '18 at 13:41
  • Ok thank you! I'll try to use more nodes. I expect to find a "breakeven" in which CrateDB overcomes SQL Server, increasing the volume of data inserted. – Davide Mizzaro Mar 08 '18 at 13:52
  • let me know how that goes. – Johannes Moser Mar 12 '18 at 19:26