0

Context

I am looking for a way to programmatically introduce LIMIT and OFFSET in any SQL query (provided in a conf file). To pad the initial query with f"{query} LIMIT {params.offset} OFFSET {batch * params.offset} typically works for simple SQL queries...

SELECT * FROM some_table LIMIT 100000 OFFSET 3400000

... but is incorrect for JOIN based queries (and probably many other cases)

SELECT d.DiagnosticIDUnique, d.LogementID, d.DateValidite, TypeEnergieLibelle_list, td.TypeDiagnostic, c.NomClient, c.SiretClient 
FROM DIAGNOSTIC as d LEFT JOIN TYPE_DIAGNOSTIC as td ON d.TypeDiagnosticID = td.TypeDiagnosticID  
LIMIT 100000 OFFSET 3400000

Moreover, LIMIT and OFFSET won't work for some DBMS such as MSSQL, which, from what I understood, relies on an ORDER BY and expects something like

SELECT * FROM some_table 
ORDER BY Id
OFFSET 3400000 ROWS
FETCH NEXT 10000 ROWS ONLY

Need

Managing this complexity and providing an high level API is typically the job of an ORM. My project is based on pandas and comes with SQLAlchemy as a dependance. I wondered if I could

  1. load a query string to create a Query object
  2. easily introduce a pagination and
  3. convert it back to a query string for the target DBMS

I am a complete noob on SQLAlchemy (only used Django's in previous projects). Here are some of the questions I am asking myself before digging into it

  • Should I necessarily go through the creation of data models?
  • Is this an overkill ? (some simpler, almost generic way to handle things)
jarlh
  • 42,561
  • 8
  • 45
  • 63
zar3bski
  • 2,773
  • 7
  • 25
  • 58
  • OFFSET/FETCH FIRST is the ANSI/ISO SQL standard way. Probably most portable too. – jarlh Sep 23 '22 at 12:01
  • ORDER BY always makes sense, at least if you don't mind a more or less random result. – jarlh Sep 23 '22 at 12:02
  • 1
    @jarlh it's the standard, but not [universally supported](https://docs.sqlalchemy.org/en/14/core/selectable.html?highlight=fetch#sqlalchemy.sql.expression.Select.fetch). – snakecharmerb Sep 23 '22 at 12:07
  • 1
    @snakecharmerb, I know, that's why I wrote _Probably most portable_. Is there any more common alternative? – jarlh Sep 23 '22 at 12:09
  • 1
    I don't think so (apart from LIMIT/OFFSET of course), the OP may need to check the RDBMS to select the code to write. Or perhaps some compiler magic in SQLA – snakecharmerb Sep 23 '22 at 12:15
  • Exactly @snakecharmerb . But since a query object in SQLA seems related with a specific connection, I wondered if it could – zar3bski Sep 23 '22 at 12:17
  • 1
    "incorrect for JOIN based queries" not sure why you think that, it's perfectly valid. To try doing this on multiple DBMSs is overkill. Just use your ORM. See eg https://stackoverflow.com/questions/17972020/how-to-execute-raw-sql-in-flask-sqlalchemy-app – Charlieface Sep 23 '22 at 12:46
  • Not all databases support `OFFSET`. Case in point Sybase. And I kind of agree with its decision. – The Impaler Sep 23 '22 at 13:11

0 Answers0