0

I am trying to create many objects. Currently it is done by calling session.add() then session.flush() upon each object being created. This is bad because there are over 500+ objects being created so each of them performs a query which affects performance. I am trying to optimise this.

To give some background, here is how my object roughly looks like:

class Metric:
    data: str
    fk_document_id: int (relationship)
    fk_user_id: int (realtionship)

Key thing to highlight, there are 2 relationships in this object, implemented using the relationship property that SQLAlchemy offers.

My first go was to change the code to use bulk_save_objects() however this method does not save the relationships and also requires a session.commit() which I'd ideally not do because there is a chance this transaction could get rolled back after the creation of the Metric objects.

Then I found add_all() however this performs an add() for each object anyways, so I do not think it will impact the current performance much.

I can't see any other approach, so my question is if there is a way in this case to use bulk_save_objects() to persist relationships and possibly not require a commit. Alternatively, if there is another method to do this that I do not see, please let me know.

Sorin Burghiu
  • 715
  • 1
  • 7
  • 26
  • Why do you have to `flush()` after each `session.add()`? – sudden_appearance Apr 28 '22 at 09:07
  • This is just the current implementation. Removing it seems to break the functionality, however I managed to refactor it using `add_all()` and `flush()` after however I do not think this will impact the performance much. – Sorin Burghiu Apr 28 '22 at 09:09
  • It will... `session.add()` doesn't run any sql queries, it only stores them until flush is called. That's why `session.add()` doesn't raise any `IntegrityError`, `DBApiError` and etc. `flush()` creates a transaction block `BEGIN ... COMMIT;` (for Postgres) and passes all stored sql queries inside it and runs one database request, whereas repeated `add()` and `flush()` run multiple requests for each `flush()` – sudden_appearance Apr 28 '22 at 09:14
  • Read more [here](https://stackoverflow.com/questions/4201455/sqlalchemy-whats-the-difference-between-flush-and-commit) – sudden_appearance Apr 28 '22 at 09:16
  • I see, in that case I will do some testing to see how much it affects performance. I thought I don't need to as it would not make a difference since `add_all()` just calls `add()` for each object. I'll get back with results. Thanks for the info. – Sorin Burghiu Apr 28 '22 at 09:16
  • `add_all()` is a shortcut for passing array of objects and nothing more. – sudden_appearance Apr 28 '22 at 09:17
  • Why do you need to flush ? if this is to get the foreign key (IDs) of relationship and only then add the dependent object, then this is not the way. Just use the session object to manage those foreign keys for you ... Can you also provide example of other tables (relationship tables) and also a piece of code on how do you currently add the objects to the session – jossefaz Apr 28 '22 at 10:24
  • Update: Using `add_all()` instead of `add()` then `flush()` for each object resulted in about 1.5x increase in performance. Still not the best but its a bit better than before. If you want to add an answer for this, id be happy to approve it. Thanks – Sorin Burghiu Apr 28 '22 at 10:57

0 Answers0