I have a 1,000,000 records that I am trying to enter to the database, some of the records unfortunately are not standing with the db schema. At the moment when a record failed I am doing:
- rollback to the database
- observer the exception
- fix the issue
- run again. I wish to build a script which would save a side all "bad" records but would commit all the correct ones.
Of course I can commit one by one and then when the commit fail rollback and commit the next but I would pay a "run time price" as the code would run for a long time.
In the example below i have two models: File and Client.The a relation one client has many files. In the commit.py file i wish to commit 1M File objects at once or at batches (1k). at the moment I only understand when something failed when i commit at the end, is there a way to know which object are "bad" ( Integrity errors with the foreign key as example) before, i.e park a side ( in another list) but committing all the "good"
thx a lot for the help
#model.py
from sqlalchemy import Column, DateTime, String, func, Integer, ForeignKey
from . import base
class Client(base):
__tablename__ = 'clients'
id = Column(String, primary_key=True)
class File(base):
__tablename__ = 'files'
id = Column(Integer, primary_key=True, autoincrement=True)
client_id = Column(String, ForeignKey('clients.id'))
path = Column(String)
#init.py
import os
from dotenv import load_dotenv
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
load_dotenv()
db_name = os.getenv("DB_NAME")
db_host = os.getenv("DB_HOST")
db_port = os.getenv("DB_PORT")
db_user = os.getenv("DB_USER")
db_password = os.getenv("DB_PASSWORD")
db_uri = 'postgresql://' + db_user + ':' + db_password + '@' + db_host + ':' + db_port + '/' + db_name
print(f"product_domain: {db_uri}")
base = declarative_base()
engine = create_engine(db_uri)
base.metadata.bind = engine
Session = sessionmaker(bind=engine)
session = Session()
conn = engine.connect()
#commit.py
from . import session
def commit(list_of_1m_File_objects_records):
#I wish to for loop over the rows and if a specific row rasie excaption to insert it to a list and handle after wards
for file in list_of_1m_File_objects_records:
session.add(file)
session.commit()
# client:
# id
# "a"
# "b"
# "c"
# file:
# id|client_id|path
# --|---------|-------------
# 1 "a" "path1.txt"
# 2 "aa" "path2.txt"
# 3 "c" "path143.txt"
# 4 "a" "pat.txt"
# 5 "b" "patb.txt"
# wish the file data would enter the database although it has one record "aa" which will raise integrity error