Context
I am looking for a way to programmatically introduce LIMIT
and OFFSET
in any SQL query (provided in a conf file). To pad the initial query with f"{query} LIMIT {params.offset} OFFSET {batch * params.offset}
typically works for simple SQL queries...
SELECT * FROM some_table LIMIT 100000 OFFSET 3400000
... but is incorrect for JOIN based queries (and probably many other cases)
SELECT d.DiagnosticIDUnique, d.LogementID, d.DateValidite, TypeEnergieLibelle_list, td.TypeDiagnostic, c.NomClient, c.SiretClient
FROM DIAGNOSTIC as d LEFT JOIN TYPE_DIAGNOSTIC as td ON d.TypeDiagnosticID = td.TypeDiagnosticID
LIMIT 100000 OFFSET 3400000
Moreover, LIMIT
and OFFSET
won't work for some DBMS such as MSSQL, which, from what I understood, relies on an ORDER BY
and expects something like
SELECT * FROM some_table
ORDER BY Id
OFFSET 3400000 ROWS
FETCH NEXT 10000 ROWS ONLY
Need
Managing this complexity and providing an high level API is typically the job of an ORM. My project is based on pandas and comes with SQLAlchemy as a dependance. I wondered if I could
- load a query string to create a
Query
object - easily introduce a pagination and
- convert it back to a query string for the target DBMS
I am a complete noob on SQLAlchemy (only used Django's in previous projects). Here are some of the questions I am asking myself before digging into it
- Should I necessarily go through the creation of data models?
- Is this an overkill ? (some simpler, almost generic way to handle things)