0

I have a large-data.csv:

id,foo,bar,column1,column2
1,"abc",0,"gh",42
2,"xy",1,"bla",1337

and an SQLAlchemy model:

class Asset(db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True, autoincrement=True)
    abc = db.Column(db.String)
    foo = db.Column(db.String)  # not in CSV
    bar = db.Column(db.Boolean)
    column1 = db.Column(db.String)
    column2 = db.Column(db.Integer)

How can I read the CSV into the database?

Setting things up

from flask import Flask
from flask_sqlalchemy import SQLAlchemy

app = Flask(__name__, template_folder='templates')
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///db.db'
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
db = SQLAlchemy(app)


class Asset(db.Model):
    __tablename__ = 'users'
    id = db.Column(db.Integer, primary_key=True, autoincrement=True)
    abc = db.Column(db.String)
    foo = db.Column(db.String)  # not in CSV
    bar = db.Column(db.Boolean)
    column1 = db.Column(db.String)
    column2 = db.Column(db.Integer)

db.create_all()
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • Try this solution , it doesnt directly use SQlAlchemy, but does populate DB from CSV https://stackoverflow.com/a/23104436/8965773 – nexla May 16 '19 at 09:31
  • 2
    Might also be helpful: http://www.sqlitetutorial.net/sqlite-import-csv/ – Klaus D. May 16 '19 at 09:34

2 Answers2

1

As you’re using SQLAlchemy and Flask, I’d suggest that you utilize SQLAthanor, a serialization library for SQLAlchemy with integrations for Flask and Flask-SQLAlchemy (full disclosure: I’m the author of SQLAthanor)

Basically, what SQLAthanor allows you to do is to define custom serialization and de-serialization rules for your SQLAlchemy declarative models, and then read data from CSV, JSON, or YAML to create records/instances.

The documentation is fairly comprehensive and really shows the full range of capabilities, but in essence:

  • you can define different rules to serialize (output to CSV/JSON/YAML vs deserialize (read from CSV/JSON/YAML), which can be useful for security purposes, for example
  • read numerical or other data from CSV and convert it to appropriate data types for your database representation
  • read from files or strings
  • apply custom preprocessing logic when either reading data or writing data

Given your model code above, here’s an example of how you might read corresponding records from CSV:

import csv

my_record_instances = []
with open('large-data.csv', 'r') as csv_file:
    csv_records = csv.reader(csv_file, delimiter = ',')
    for row in csv_records:
        my_instance = Asset.new_from_csv(','.join(row), 
                                         delimiter = ',')
        my_record_instances.append(my_instance)

What the code above does is it first creates a declarative model class that supports serialization and de-serialization, then iterates through each row of the CSV file and creates an instance of the declarative model class. It then appends that instance to the list of record instances. Now you can interact with these instances the way you would normally interact with any SQLAlchemy model instance.

Note that in order for the code above to work properly, you will need to incorporate SQLAthanor into your model definition, which is actually pretty easy. For more information, see Using SQLAthanor.

Here are some links to the relative documentation that you may find helpful:

Hope this helps!

Chris Modzelewski
  • 1,351
  • 10
  • 10
0

I'm not sure about the speed, but this works:

import pandas as pd

engine = db.get_engine()  # db is the one from the question
csv_file_path = 'large-data.csv'

# Read CSV with Pandas
with open(csv_file_path, 'r') as file:
    df = pd.read_csv(file)

# Insert to DB
df.to_sql('users',
          con=engine,
          index=False,
          index_label='id',
          if_exists='replace')
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958