4

Is it possible to use a couple of fields not from the primary key to retrieve items (already fetched earlier) from the identity map? For example, I often query a table by (external_id, platform_id) pair, which is a unique key, but not a primary key. And I want to omit unnecessary SQL queries in such cases.

Ivan Velichko
  • 6,348
  • 6
  • 44
  • 90

2 Answers2

6

A brief overview of identity_map and get():

An identity map is kept for a lifecycle of a SQLAlchemy's session object i.e. in case of a web-service or a RESTful api the session object's lifecycle is not more than a single request (recommended).

From : http://martinfowler.com/eaaCatalog/identityMap.html

An Identity Map keeps a record of all objects that have been read from the database in a single business transaction. Whenever you want an object, you check the Identity Map first to see if you already have it.

In SQLAlchemy's ORM there's this special query method get(), it first looks into identity_map using the pk (only allowed argument) and returns object from the identity map, actually executing the SQL query and hitting the database.

From the docs:

get(ident)

Return an instance based on the given primary key identifier, or None if not found.

get() is special in that it provides direct access to the identity map of the owning Session. If the given primary key identifier is present in the local identity map, the object is returned directly from this collection and no SQL is emitted, unless the object has been marked fully expired. If not present, a SELECT is performed in order to locate the object.


Only get() is using the identity_map - official docs:

It’s somewhat used as a cache, in that it implements the identity map pattern, and stores objects keyed to their primary key. However, it doesn’t do any kind of query caching. This means, if you say session.query(Foo).filter_by(name='bar'), even if Foo(name='bar') is right there, in the identity map, the session has no idea about that. It has to issue SQL to the database, get the rows back, and then when it sees the primary key in the row, then it can look in the local identity map and see that the object is already there. It’s only when you say query.get({some primary key}) that the Session doesn’t have to issue a query.


P.S. If you're querying not using pk, you aren't hitting the identity_map in the first place.


Few relevant SO questions, helpful to clear the concept:

Forcing a sqlalchemy ORM get() outside identity map

SuperShoot
  • 9,880
  • 2
  • 38
  • 55
Nabeel Ahmed
  • 18,328
  • 4
  • 58
  • 63
  • 3
    Yes, I've read almost all this docs too. However, I still don't understand why the identity map is not used in case of querying by non-PK unique params. However, now I think that my question should be addressed directly to the SQLAlchemy developers. – Ivan Velichko Dec 17 '15 at 13:03
  • It'd go against the purpose, as then identity_map has to map a lot of stuff, especially a single object multiple times, with various fields as the map key. – Nabeel Ahmed Dec 17 '15 at 13:07
  • Yeah you can ask SQLAlchemy developers for allowing some parameter to set the default field for the get() method (instead of only the pk), hence 'll be used as key in indentity_map - but it has to be consistent application wide. – Nabeel Ahmed Dec 17 '15 at 13:10
  • Why do I have to map single object multiple times in case of using unique keys during access the identity map? As you've quoted from the docs, SQLAlchemy checks the identity map after performing an arbitrary SQL query to detect objects already were fetched earlier. The same approach can be used to reduce duplication (because `PK <-> Unique Key` is always a one-to-one relationship). – Ivan Velichko Dec 17 '15 at 16:00
  • I said it in the context of 'if the get() method, which uses the identity_map, be set/allowed to take fields also other than the PK ' - in that case there'll be multiple copies of a single object i.e. with various fields as the key in identity map. – Nabeel Ahmed Dec 17 '15 at 19:21
  • Why don't you specify the id pair of yours as PK ? PrimaryKeyConstraint - http://docs.sqlalchemy.org/en/latest/core/constraints.html?highlight=primarykeyconstraints#primary-key-constraint – Nabeel Ahmed Dec 17 '15 at 19:24
5

It's possible to access the whole identity map sequentially:

for obj in session.identity_map.values():
    print(obj)

To get an object by arbitrary attributes, you then have to filter for the object type first and then check your attributes.

It's not a lookup in constant time, but can prevent unnecessary queries.

There is the argument, that objects may have been modified by another process and the identity map doesn't hold the current state, but this argument is invalid: If your transaction isolation level is read committed (or less) - and this is often the case, data ALWAYS may have been changed immediately after the query is finished.

Krangerich
  • 293
  • 3
  • 6
  • It was long time ago, but the problem was that when you access the same (on the db level) object twice in an app process (first time by PK, second time by Unique Key), you will have two full copies of the object instead of two references. And modification of such object in your process in these two places will not change the other copy of the same object without refreshing it. So the problem exists even if you don't have any concurrent accesses. – Ivan Velichko Feb 26 '18 at 12:48