0

Is there a faster way to insert these dictionary records? The dictionary "final" generally has about 153,000 records in it. Insertion time for all records currently stands at 1 minute 18 seconds on a core i5 processor, doing everything locally.

def output_db(final, sqlite_file):
    conn = sqlite3.connect(sqlite_file)
    c = conn.cursor()

    field_names_dict = get_field_names(final)
    sql_field_names = ",".join(field_names_dict.keys())
    for key, row in final.items():
        insert_sql = 'INSERT INTO asset(' + sql_field_names + ') VALUES ("' + '","'.join(field_value.replace('"', "'") for field_value in list(row.values())) + '")'
        c.execute(insert_sql)

    conn.commit()
    conn.close()

Here is what each row of data typically looks like:

("computer_name", comp_name), 
("domain_name", domain),
("serial_num", serial),
("device_type", type),  
("mfr", mfr), 
("model", model),
("os_type", os_type),
("os_ver", os_ver),
("os_subver", os_subver),
("country", country),
("state", state),
("history", "\n".join(hist)),
("cmdb_resource_name", cmdb_resource_name), 
("cmdb_dns_name", cmdb_dns_name),
("cmdb_serial_number", cmdb_serial),
("cmdb_family", cmdb_family), 
("cmdb_manufacturer", cmdb_manufacturer),
("cmdb_model", cmdb_model),
("cmdb_class", cmdb_class),
("cmdb_operating_system", cmdb_operating_system),
("cmdb_location_name", cmdb_location_name),
("cmdb_country", cmdb_country),
("bay_name", bay_name),
("bay_dns_name", bay_dns_name),
("bay_mfr", bay_mfr), 
("bay_model", bay_model), 
("bay_serial_number", bay_serial_number), 
("bay_device_type", bay_device_type),
("bay_os_name", bay_os_name),
("bay_os_sp", bay_os_sp),
("bay_os_ver", bay_os_ver), 
("tan_id", tan_id),
("tan_comp_name", tan_comp_name),
("tan_os", tan_os),
("tan_os_build", tan_os_build),
("tan_os_sp", tan_os_sp),
("tan_country_code", tan_country_code),
("tan_mfr", tan_mfr),
("tan_model", tan_model),
("tan_serial", tan_serial)
gunslingor
  • 1,358
  • 12
  • 34
  • Inserting the data row by row is the slowest way to do it. What is the format of the *original* data? Why not use the shell's `.import` command? – Panagiotis Kanavos Jan 23 '17 at 14:20
  • FYI. Tried making the loop form 1 VERY long Insert statement... that really slowed it down, the INSERT string itself is about 400MB. Lol. – gunslingor Jan 23 '17 at 14:32
  • Instead of building one huge string (why?), build one CSV file and use `.import`. Or use [executemany](https://docs.python.org/2/library/sqlite3.html), mentioned in the second answer to the duplicate question - no loops, no string concatenation. Or partition the data as shown in the duplicate and use a single transaction per chunk. – Panagiotis Kanavos Jan 23 '17 at 14:40
  • Panagiotis, could you provide an answer with the import solution. The reason is that one of my design requirements is to output a CSV file anyway.... so if import will be faster it definitely seems good. I'm sure I'll need to fix a few spurious commas and new line chars in my csv to make this work. – gunslingor Jan 23 '17 at 14:42
  • There are some tools to validate csv. – Marichyasana Jan 23 '17 at 15:12

0 Answers0