0

I'm running into some timeouts with MySQL where I suspect the issue is with time-based queries of labels in my database.

I have the following labels table in MySQL, where the attributes are as follows:

label_id    PRIMARY KEY
entity_id   FOREIGN KEY
asset_id    FOREIGN KEY
from        DATETIME (INDEXED FIELD)
to          DATETIME (INDEXED FIELD)
value       STRING

There are currently 550k records in the labels table, which may not seem like a breaking number, but I suspect that the latency is also due to the query pattern, where the label fetch is queried with a date_range, given by a lower and uppper time-bound. e.g. {lower: "2021-01-15T00:00:00.000Z", upper: "2021-01-25T00:00:00.000Z"}. The required fetch is thus given by the following pseudocode:

WHERE labels.from < date_range.upper AND labels.to > date_range.lower
  AND labels.entity_id=entity_id
  AND labels.asset_id=asset_id

Currently, it seems that with the 550k records (and rapidly increasing) in the labels table, I'm seeing higher latency that may be leading to timeouts more and more frequently. Would like to seek some opinion here on what possible solutions I can take (and the pros and cons):

  1. Sharding the labels table - seems irreversible and may be harder to manage in future
  2. Using another database? Any suggestions here?

I couldn't find any easy choices that can do query with multiple fields as well as multiple conditional queries on a datetime field. Even if it is possible with MongoDB for e.g., sharding is probably also needed, and based on what I'm reading so far, I may not get much latency improvements as the indexing and query will be similar, vs the additional overheads of managing an additional database on top of MySQL.

Any advice or help here will be greatly appreciated!! Thanks!

jlyh
  • 681
  • 9
  • 32
  • 2
    Given that time is linear, isn't pseudocode `labels.from AND labels.to < date_range.lower` the same as `labels.to < date_range.lower`? – Strawberry Mar 16 '21 at 08:11
  • Good point! I also found this other question: https://stackoverflow.com/questions/117962/what-is-a-simple-and-efficient-way-to-find-rows-with-time-interval-overlaps-in-s Will try it out to see if helps with the query latency and update the question soon, but I think the question still remains - are there any other better ways to store the labels? Or is this already (close to) the best way and the only way would be to shard the db? – jlyh Mar 16 '21 at 09:59
  • Well, you're doing a search against `from` and `to`, so presumably an index on (from,to) would be more useful than (what you appear to have, which is) a separate index on each. – Strawberry Mar 16 '21 at 11:51
  • Are you getting zero rows? `AND labels=entity_id AND labels=asset_id` will be true only if `entity_id = asset_id`. – Rick James Mar 16 '21 at 20:02
  • Please provide `SHOW CREATE TABLE` for each table and `EXPLAIN SELECT ...` – Rick James Mar 16 '21 at 20:03

0 Answers0