How to index and query a very large DB with 60M rows and 50 columns

Question

I have a big table with 60M rows and 50 columns (columns include "company_idx" and "timestamp"). Thus, when I do my simple SQL Query such as:

SELECT * FROM companies_Scores.Scores 
WHERE `company_idx`=11 
  AND `timestamp` BETWEEN  '"+start_date+" 00:00:00' AND '"+end_date+" 00:00:00'

It takes basically 4 minutes to run (which is way too long). Thus, I thought about indexing my table, so I've done:

CREATE INDEX idx_time ON companies_Scores.Scores(company_idx, timestamp) USING BTREE;

However, when I now do the following, it takes also 4 minutes to run.

SELECT * FROM companies_Scores.Scores 
USE INDEX(idx_time) 
WHERE `company_idx`=11 
  AND `timestamp` BETWEEN  '"+start_date+" 00:00:00' AND '"+end_date+" 00:00:00'

I'm really a beginner with SQL and indexes. So I'm not really sure how to use indexes in a query. I guess the one I've done above is correct? Why does it take so much time? How can I improve it? I'd like my queries for each company_idx to be as quick as possible.

When I run EXPLAIN, I get:

[{'Cardinality': 115751,
  'Collation': 'A',
  'Column_name': 'company_idx',
  'Comment': '',
  'Index_comment': '',
  'Index_type': 'BTREE',
  'Key_name': 'idx_time',
  'Non_unique': 1,
  'Null': 'YES',
  'Packed': None,
  'Seq_in_index': 1,
  'Sub_part': None,
  'Table': 'Scores'},
 {'Cardinality': 45831976,
  'Collation': 'A',
  'Column_name': 'timestamp',
  'Comment': '',
  'Index_comment': '',
  'Index_type': 'BTREE',
  'Key_name': 'idx_time',
  'Non_unique': 1,
  'Null': 'YES',
  'Packed': None,
  'Seq_in_index': 2,
  'Sub_part': None,
  'Table': 'Scores'}]

What does this have to do with python? Please only add relevant tags (removed) — rdas, Oct 03 '19 at 18:10
Please run `EXPLAIN ..` on the query and add the details of the same to question. Also, do you really need to do `SELECT * ..` ? Do read: [Why is SELECT * considered harmful?](https://stackoverflow.com/q/3639861/2469308) — Madhur Bhaiya, Oct 03 '19 at 18:14
@mike.depetriconi This does not look like result of `EXPLAIN ..` statement. Are you sure that you are using MySQL ? Please run `EXPLAIN ` — Madhur Bhaiya, Oct 03 '19 at 18:34
@mike.depetriconi `USE INDEX` is just a hint; it does not force MySQL to use that index only. Instead try running the query with `FORCE INDEX(...)` — Madhur Bhaiya, Oct 03 '19 at 18:42
@MadhurBhaiya it looks like it improved by a minute. But my request still takes 3 minutes to run. How can I do? — mike.depetriconi, Oct 03 '19 at 18:56
@mike.depetriconi The data you are trying to fetch in a single query is too big. Do you really need to get all these thousands (or maybe millions) of rows at once. Generally, we use `LIMIT ..` clause and use pagination. Also, do you really need to do `SELECT *` ? Why not just fetch the relevant fields only — Madhur Bhaiya, Oct 03 '19 at 18:58
Here I would suggest instead of index use MySql [Range Partitioning](https://dev.mysql.com/doc/refman/5.7/en/partitioning-range.html) that will definately increase perform. Partition your taable on timestamp (may be month wise or year wise whichever has less data) and don't forgot to use timestamp condition as first condition in where clause. — Santosh D., Oct 08 '19 at 04:57
@SantoshD. - If it can use `INDEX(country_idx, timestamp)`, then no flavor of partitioning will run any faster. — Rick James, Oct 12 '19 at 22:49

score 0 · Answer 1 · answered Oct 03 '19 at 18:14

0

Your index looks correct for the query. You are forcing index usage, so we can assume the index is being used, if possible.

One issue may be that the index cannot be used. That would occur if you have type problems with the columns. For instance, the comparison value 11 is a number. If customer_idx is a string, you have a problem. The comparison should be a string -- '11'.

Another issue is simply that there might be a lot of data. If even a few percent of the rows match the conditions, the index is not going to help. One major use of indexes is for "needle-in-the-haystack" queries. They help finding a small subset. They don't help if you need all or much of the haystack.

answered Oct 03 '19 at 18:14

Gordon Linoff

1,242,037
58
646
786

`company_idx` is varchar(20), and in my query 11 is a str too. I'm retrieving around 3600 rows of data on every query (so it's a relatively small subset out of 60M rows, right?) – mike.depetriconi Oct 03 '19 at 18:26
@mike.depetriconi . . . 3600 rows may or may not be small, depending on how many pages need to be read. 4 minutes seems like a long time to read 3600 pages, but some time makes sense. – Gordon Linoff Oct 03 '19 at 21:43
@mike.depetriconi - You did not write it as a string! Add the quotes are try again! – Rick James Oct 12 '19 at 22:50

How to index and query a very large DB with 60M rows and 50 columns

1 Answers1