I am using sqlalchemy to write to a mysql db where I am indexing some files and storing their contents. I need to write the files and then write the index entries which have a foreign key to the files
table. However, sqlalchemy seems to be issuing the INSERT
statements out of order.
Here is a minimal functional example illustrating the problem using mock random data (less the configuration file, which contains server specific info):
Index/ORM.py:
#!/bin/env python2.7
from __future__ import print_function
import os
from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.dialects.mysql import LONGBLOB, INTEGER
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy import create_engine
from Index import load_cfg
class Base(object):
"""
Basic MySQL table settings
"""
__table_args__ = {
'mysql_engine': 'InnoDB',
'mysql_collate': 'latin1_general_cs'
}
Base = declarative_base(cls=Base)
class CoverageIndex(Base):
"""
Class for coverage_index table objects
"""
__tablename__ = 'coverage_index'
filename = Column(String(45), primary_key=True)
#filename = Column(String(45), ForeignKey("files.filename"), primary_key=True)
sequence_id = Column(String(45), primary_key=True, index=True)
def __init__(self, filename, sequence_id):
self.filename = filename
self.sequence_id = sequence_id
class FileRow(Base):
"""
Class for files stored in db
"""
__tablename__ = 'files'
filename = Column(String(45), primary_key=True)
contents = Column(LONGBLOB)
def __init__(self, filename, contents):
self.filename = filename
self.contents = contents
cfg = load_cfg()
db_string = 'mysql://%(user)s:%(passwd)s@%(host)s/%(db)s' % cfg['db_config']
engine = create_engine(db_string, echo=True)
Base.metadata.create_all(engine)
if __name__ == '__main__':
pass
index.py:
#!/usr/bin/env python2.7
from __future__ import print_function
import os
import sys
from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
from sqlalchemy.exc import IntegrityError
from Index.ORM import Base, CoverageIndex, FileRow, engine as db_engine
if __name__ == '__main__':
import string, random
data = {}
for i in range(0,10):
file = 'file' + str(i)
data[file] = {
'seqs': ['seqa' + str(i), 'seqb' + str(i)],
'contents': '\n'.join([''.join([random.choice(string.letters) for x in range (0, 80)]) for y in range (0, 2500)])}
#print (data)
Base.metadata.bind = db_engine
DBSession = sessionmaker(bind=db_engine)
session = DBSession()
for file, datum in data.iteritems():
file_query = session.query(FileRow).filter(FileRow.filename == file)
if file_query.count() > 0:
session.query(CoverageIndex).filter(CoverageIndex.filename == file).delete(synchronize_session='fetch')
file_query.delete(synchronize_session='fetch')
for i in datum['seqs']:
# Write to DB
fqc = file_query.count()
print ("No. of files: " + str(fqc))
if fqc == 0:
print ("Adding: ")
fr = FileRow(
filename = file,
contents = datum['contents']
)
session.add(fr)
cov = CoverageIndex(
filename = file,
sequence_id = i)
session.add(cov)
try:
session.commit()
except:
#print ("SQL Commit Failed: %s" % file)
session.rollback()
session.close()
raise
session.close()
Here is a part of the output of one run. I would like to draw your attention to line 2018-03-13 16:05:40,291
and ...,292
:
...
2018-03-13 16:05:40,287 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2018-03-13 16:05:40,288 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents
FROM files
WHERE files.filename = %s) AS anon_1
2018-03-13 16:05:40,288 INFO sqlalchemy.engine.base.Engine ('file1',)
2018-03-13 16:05:40,290 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents
FROM files
WHERE files.filename = %s) AS anon_1
2018-03-13 16:05:40,290 INFO sqlalchemy.engine.base.Engine ('file1',)
No. of files: 0
Adding:
2018-03-13 16:05:40,291 INFO sqlalchemy.engine.base.Engine INSERT INTO coverage_index (filename, sequence_id) VALUES (%s, %s)
2018-03-13 16:05:40,291 INFO sqlalchemy.engine.base.Engine ('file1', 'seqa1')
2018-03-13 16:05:40,292 INFO sqlalchemy.engine.base.Engine INSERT INTO files (filename, contents) VALUES (%s, %s)
2018-03-13 16:05:40,292 INFO sqlalchemy.engine.base.Engine ('file1', 'BkTsRJTcNEigPFjofFxDmwVZDXRAsPECawRUjiFZTDGWWoLZzLnGlCwQQeAFyXhLqKjPAJmme
mFNfVzF\nJlZSvwGAdoImTnBAmcrSdMRDvxNYnnMfbQXdfuXulqufiIYpqjFUgfElZSrVkvBvPTg ... (204700 characters truncated) ... trwtYOycEOuDTVxsXeGoNYKAqHlE
LGPqcimwzwAFAEsCZGBBnGzYMHgabgnGZaGmQsn\nSNjYvBwSVdXVKbmJpKdSHSXCDKKvDlkyLxOxsEfOtmlCRruqzaiPhYRocKZQEJSVrtSHncFMBMTEpWUX')
2018-03-13 16:05:40,310 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1
FROM (SELECT files.filename AS files_filename, files.contents AS files_contents
FROM files
WHERE files.filename = %s) AS anon_1
2018-03-13 16:05:40,310 INFO sqlalchemy.engine.base.Engine ('file1',)
No. of files: 1
2018-03-13 16:05:40,311 INFO sqlalchemy.engine.base.Engine INSERT INTO coverage_index (filename, sequence_id) VALUES (%s, %s)
2018-03-13 16:05:40,311 INFO sqlalchemy.engine.base.Engine ('file1', 'seqb1')
2018-03-13 16:05:40,312 INFO sqlalchemy.engine.base.Engine COMMIT
...
Here, you can see that sqlalchemy is inserting the coverage_index
before inserting the files
object. I presume that this is because the files object is much bigger and taking some time to prepare, and so the engine decides to asynchronously run the later INSERT
first.
However, the files
entry needs to be inserted first, because filename
in coverage_index
should be a foreign key to files
. (It throws an exception if I do this with the foreign key constraint defined)
I know I can commit after adding to files
, but I would prefer that the files
and coverage_index
INSERT
be in the same transaction, so they stay in sync.
So the question is, is there a way to force sqlalchemy to execute synchronously within a transaction?