SQLAlchemy bulk insert with missing values using default values

Question

Problem Statement -

I want to bulk insert a few hundred rows using SQLAlchemy. The schema looks like following

all_scrips_tbl = Table('all_scrips_info', _METADATA, Column('security_isin', String(16), primary_key=True), Column('company_name', String(80)), Column('nse_traded', Boolean, default=False), Column('nse_start_date', Date, default=datetime.date(year=2001, day=1, month=1)), Column('bse_traded', Boolean, default=False), Column('bse_start_date', Date, default=datetime.date(year=2001, day=1, month=1)), ) Now each scrip can either be - nse_traded=True, bse_traded=True or both nse_traded=True and bse_traded=True

So I have insert statements like -

For securities that have only nse_traded=True ins = t.insert().values(security_isin=nstock.isin, company_name=nstock.name, nse_traded=True, nse_start_date=nstart_date, )

For securites that have only bse_traded=True -

ins = t.insert().values(security_isin=bstock.isin, company_name=bstock.name, bse_traded=True, bse_start_date=bstart_date)

and correspondingly for nse_traded=True and bse_traded=True

I'd like to bulk insert those statements. So something like values().compile with default values from create statement would be very useful so that I can then use the following -

conn.execute(all_scrips_info.insert() , [ {}, {} ] ) Where dicts are populated with defaults as appropriate?

I also looked at this question, but this is slightly different than my requirement. There's an old question on google groups which is similar to my requirement. But the sqlalchemy version there is rather old, plus answers are not very easily understandable.

Am I missing something very obvious?

Am I understanding correctly that you have a dataset in which there are records for the three groups: (1) nse_traded, (2) bse_traded and (3) nse_traded and bse_traded and you want to bulk insert these records? — mtndoe, Jul 17 '17 at 10:34
Yes that's right - I also have Defaults defined wherever applicable. What I am doing right now is generating `insert` statements using `values` and running them in a loop like `execute(insert)`. I'd like to use `execute(insert, [{}, {}])` where values from default are used when i don't use them explicitly in `insert.values`. — gabhijit, Jul 18 '17 at 02:05

mtndoe · Answer 1 · 2017-07-18T16:37:17.540

Below I provide a simple example of how you can more easily insert data from a CSV file into a database using SQLAlchemy. The default value here is the date created. In the init function you can add more logic... This logic is applied when creating an object.

I want to point out that other methods are available. See for example the SQLAlchemy documentation: http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#bulk-operations

Example:

import csv
from io import StringIO
from sqlalchemy import Column, Integer, String, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from datetime import datetime

# Setting up model

Base = declarative_base()


class Car(Base):
    __tablename__ = 'car'

    id = Column('id', Integer, primary_key=True)
    created = Column('created', DateTime, default=datetime.now())
    brand = Column(String(250))
    nr_gears = Column('nr_gears', Integer)

    def __init__(self, brand, nr_gears):
        self.brand = brand
        self.nr_gears = nr_gears

        # You can place logic,conditions here when creating the objects. Now only use of default date - creation

    def __repr__(self):
        return '{created}\t{brand} (nr of gears: {nr_gears})'.format(
            created=datetime.strftime(self.created, '%d/%m/%Y'), brand=self.brand, nr_gears=self.nr_gears)


engine = create_engine('sqlite:///')
session = sessionmaker()
session.configure(bind=engine)
ex_ses = session()
Base.metadata.create_all(engine)

# Reading in some dummy data

csv_data = \
    '''Audi,5
    BMW,5
    Mercedes-Benz,5
    Opel,5
    Porsche,6
    Volkswagen,5
    Acura,5
    Datsun,6
    Honda,6
    Infiniti,5
    Isuzu,6
    Lexus,5
    Mazda,5
    Mitsubishi,5
    Nissan,5
    Suzuki,6
    Toyota,6
    Subaru,6'''

with StringIO(csv_data) as car_csv:
    cars_data = []
    readCSV = csv.reader(car_csv, delimiter=',')
    for row in readCSV:
        # Add data to the session.
        ex_ses.add(Car(brand=row[0], nr_gears=row[1]))

ex_ses.commit()

# Query
cars = (ex_ses.query(Car).all())
print('Created\t\tBrand (nr of gears)')
print('-' * 45)
for car in cars:
    print(car)

# output:
'''
Created     Brand (nr of gears)
---------------------------------------------
18/07/2017  Audi (nr of gears: 5)
18/07/2017  BMW (nr of gears: 5)
18/07/2017  Mercedes-Benz (nr of gears: 5)
18/07/2017  Opel (nr of gears: 5)
18/07/2017  Porsche (nr of gears: 6)
18/07/2017  Volkswagen (nr of gears: 5)
18/07/2017  Acura (nr of gears: 5)
18/07/2017  Datsun (nr of gears: 6)
18/07/2017  Honda (nr of gears: 6)
18/07/2017  Infiniti (nr of gears: 5)
18/07/2017  Isuzu (nr of gears: 6)
18/07/2017  Lexus (nr of gears: 5)
18/07/2017  Mazda (nr of gears: 5)
18/07/2017  Mitsubishi (nr of gears: 5)
18/07/2017  Nissan (nr of gears: 5)
18/07/2017  Suzuki (nr of gears: 6)
18/07/2017  Toyota (nr of gears: 6)
18/07/2017  Subaru (nr of gears: 6)
'''

Hope this helps.

Edit: based on the comments I have expanded the Car class below. If the number of gears is missing... the default value of nine is used. This is of course a limited example but (I hope) shows the possibilities in working with default values…

class Car(Base):
    __tablename__ = 'car'

    id = Column('id', Integer, primary_key=True)
    created = Column('created', DateTime, default=datetime.now())
    brand = Column(String(250))
    nr_gears = Column('nr_gears', Integer,default=None)

    def __init__(self, brand, nr_gears):
        self.brand = brand

        if nr_gears is None or nr_gears == '':
            nr_gears = 9

        self.nr_gears = nr_gears

        # You can place logic,conditions here when creating the objects. Now only use of default date - creation

    def __repr__(self):
        return '{created}\t{brand} (nr of gears: {nr_gears})'.format(
            created=datetime.strftime(self.created, '%d/%m/%Y'), brand=self.brand, nr_gears=self.nr_gears)

There are a few issues with your solution and hence can't accept it. 1. Imagine if your CSV files didn't have 'gears' entry at all for some models and in your DDL you mentioned it as 5. so whenever I don't specify it, use 5 (the very purpose of 'default' attribute). The problem with SA bulk insert is - The syntax `execte(table.insert, [{dict of values}])` mandates that the size be 'same for all dictionaries`. If my data indeed came from one CSV file, I might be able to write this - it comes from two sets (populated from some different places). — gabhijit, Jul 18 '17 at 07:21
Disregarding the problem with SA bulk insert. The further question on how to define and insert a default value for a missing value (if the value is completely missing from the CSV...0? This can be realised in the init. If this (my interpretation of the question) is correct I will update the example. — mtndoe, Jul 18 '17 at 07:35
Yes - but i don't want a heavy lifting 'outside' SA - I was just thinking if there's an easier way that I missed and SA already provides that. — gabhijit, Jul 18 '17 at 11:49

SQLAlchemy bulk insert with missing values using default values

1 Answers1