7

Issue Description

Adding a limit to a Sequelize Query with a SubQuery fails to limit retrievals. Multiple online resources referencing this error and no solutions. Is this a Sequelize error or user error?

What are you doing?

ThreadFolderUser.findAll({
order: [
  ['updated_at', 'DESC']
],
where: {
  user_id,
  folder_id,
  deleted,
  archived,
},
distinct: true,
offset,
limit: 10,
include: [
  {
    model: Thread,
    include: [
      { model: Email, include: [Attachment] },
    ]
  }
],

})


Associations

// ThreadFolderUser (assoc table) - Thread / Folder / User (tables)
User.hasMany(ThreadFolderUser, { foreignKey: 'user_id' })
ThreadFolderUser.belongsTo(User, { foreignKey: 'user_id' })
Folder.hasMany(ThreadFolderUser, { foreignKey: 'folder_id' })
ThreadFolderUser.belongsTo(Folder, { foreignKey: 'folder_id' })
Thread.hasMany(ThreadFolderUser, { foreignKey: 'thread_id' })
ThreadFolderUser.belongsTo(Thread, { foreignKey: 'thread_id' })

// Thread - Emails
Thread.hasMany(Email, { foreignKey: 'thread_id' })
Email.belongsTo(Thread, { foreignKey: 'thread_id' })

// Email - Attachments
Email.hasMany(Attachment, { foreignKey: 'email_id' })
Attachment.belongsTo(Email, { foreignKey: 'email_id' })

What do you expect to happen?

I expected 10 records (based on the limit currently set to 10) retrieved from the AssociationTable, since I have at least 15 records in the database that match this query.

What is actually happening?

Returns 6 in my case, instead of 10 (with the limit set to 10). Instead of pulling the first 10 matches.


Additional context

If I remove the limit, it works as intended (even with the includes).

If I remove the include, it works as intended (even with the limit).

If I copy/paste the SQL Query generated by Sequelize and insert it directly into Workbench, it retrieves the proper amount of rows.

It seems the issue is the limit combined with the include cause the query to retrieve only the records that match within the first 10 searched in the DB.


Other references to the same issue without a proper solution presented:


Environment

  • Sequelize version: v5.21.3
  • Node.js version: v12.13.1
  • Operating System: AWS Lambda Function
  • TypeScript version: 3.7.2

I'm well aware that this exact same issue has been brought up in multiple other threads and platforms -- as I have linked a few of them above -- however none of them have a direct answer, and 1 of them marked an irrelevant point as the answer which did not solve the intended issue. I'm hoping we can get an answer to this, or a realistic workaround beyond hard coding the SQL Query (last resort).

It would be unthinkable for Sequelize not to be able to handle a limit with an include in the same query, so there must be something missing / user error on my side. I've searched multiple times and certainly started with Sequelize documentation, of which does not reference this issue or a similar example, or any problems that may arise with combining a limit and include.

Many thanks for any contributions made to help solve this issue. Hopefully some @Sequelize Engineer is out there able to help answer this :)

Speros
  • 361
  • 2
  • 4
  • 10

1 Answers1

19

You cannot properly limit a sequelize query that including hasMany association until you make sequelize to get included association objects by separate queries. In your query you have include with the association Thread.hasMany(Email so you should indicate separate: true in the Email include like this (this also goes for Attachment association):

include: [
      { 
       model: Email, 
       separate: true,
       include: [{
         model: Attachment,
         separate: true
       }] },
    ]

Also you don't need to indicate distinct: true because we already indicated to separate hasMany associations into its own queries.

Another problem with hasMany includes (especially with nested hasMany) in queries that they turn into JOIN's in a SQL query that means a DB multiplies an amount of main records to an amount of nested records and so on. For instance: 100 main records each has 100 linked records each has its own linked 100 records (Thread -> Email -> Attachment). All in all you make a DB to query 100*100*100 - 1 million records at once! It usually leads to out of memory.

About LIMIT and hasMany: a DB selects 100 main records with 100 linked for each (10000 records at once) and after that it takes first 10 records from these 10000 records (not from 100 main records). That's how SQL queries work!

Anatoly
  • 20,799
  • 3
  • 28
  • 42
  • This is an amazing answer, and I have yet to see anything remotely close to a logical answer such as this. Thank you for providing the solution and most importantly explaining why it works this way. I am dealing with a large database, over 10 million email records (individual records in 1 table) from an old DB, of which I'll be creating a script to convert them into threads connected to relative emails and attachments. Due to this I'm hoping the JOIN statements won't be anywhere near the 100*100*100, but point well taken and I think I understand it, I'll keep an eye on this. – Speros May 12 '20 at 22:58
  • To confirm if understand this right, if I'm pulling 20 threads, with an average of 15 emails each and an average of 5 attachments each that would be a query of 1500? Or will it actually be pulling (all matching threads) * (all matching emails on those threads) * (all matching attachments on this emails) and then limiting down to the main 20, so potentially my queries will always be pulling a lot more records before filtering down to the main 20 limit/offset - am I correct on this thinking? – Speros May 12 '20 at 22:58
  • So my concern will be the main query's where statement to ensure its only pulling the necessary association records, in this case, the association records specific to that 1) user and 2) mailbox (of which I've named folder), so if there are 500 records matching this user/folder combination, it will always start at 500 * matching Emails to the 500 * matching Attachments to the Emails - and then pair down to the limit and offset, is correct? – Speros May 12 '20 at 22:59
  • 1
    If you indicate `separate: true` then you'll get 20 threads (1 query with 20 threads only filtered and limited), then 20 separate queries to get emails then 20*5 separate queries to get attachments. – Anatoly May 13 '20 at 16:04
  • 1
    Emails and attachments will be queried only for already filtered and limited threads – Anatoly May 13 '20 at 16:07