1

We have a MySQL table (table_ha) like this one:

Name = table_ha
+----------+------------------+
| hash_loc |     hash_val     |
+----------+------------------+
|  242342  | 9606075000001005 |
+----------+------------------+
|  431231  | 9606075000005208 |
+----------+------------------+
|  342344  | 7645345456745536 |
+----------+------------------+
|  324254  | 7656453453465788 |
+----------+------------------+
|  656456  | 9788674534546766 |
+----------+------------------+
|  674453  | 3458752778456834 |
+----------+------------------+
|   ...    |       ...        |
+----------+------------------+
|  765874  | 8796634586346785 |
+----------+------------------+
|  864534  | 9834667054534588 |
+----------+------------------+

We continuously execute queries like the following one:

SELECT * FROM table_ha (SELECT 1 AS hash_loc UNION ALL SELECT 28700 UNION ALL SELECT 28728 ... UNION ALL SELECT 28680 UNION ALL SELECT 28694) AS T1 ON table_ha.hash_loc = T1.hash_loc'

We must assume that we might have thousands of numbers in the query (enclosed in UNION ALL SELECT X). When the number of rows in table_ha is not high, it works ok. Now, imagine having thousands of millions of rows. Then it becomes very slow.

Do you know if partitioning can work in a case like this? How can that be applied to the present table? Do you now any other alternative to this?

NOTE: hashloc is a BigInt(32) and hash_val is a BigInt(64)

juanba1984
  • 357
  • 2
  • 13

1 Answers1

1

I don't think partitioning is needed in this case. I would recommend making sure you have an index on table_ha.hash_loc.

I'm not sure why you use the subquery with UNION instead of just using an IN() predicate:

SELECT * FROM table_ha 
WHERE hash_loc IN (1, 28700, 28728 ... 28680, 28694);

By the way, BIGINT(32) is the same as BIGINT(64). See my answer to Types in MySQL: BigInt(20) vs Int(20)


Re your comment:

Partitioning only helps when you're searching on the column(s) used for the partitioning key. And you can only partition a given table one way. Indexes are more generally useful, because you can create multiple indexes per table.

I work with tables of hundreds of millions of rows, and the indexes help a lot. But indexes have to be designed carefully to match each specific query you want to optimize.

You might like my presentation How to Design Indexes, Really. Also there's a video of me presenting it: https://www.youtube.com/watch?v=ELR7-RdU9XU

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Hi Bill, in this table, I have a composite primary_key, (hash_loc, hash_val), because hash_loc is not unique. But later, when I make queries, I only search by hash_loc. I think your query might work if hash_loc didn't allow repeted values, but it actually contains the same value multiple times. – juanba1984 Jan 17 '17 at 18:13
  • But that's no different from the query you posted in your question. – Bill Karwin Jan 17 '17 at 18:58
  • Ok, thank you, you are right. But, my question is more focused on how to make the table scalable when I have thousands of millions of rows. The more it grows, the longer it takes to find what I am looking for. Can't partitioning help here? – juanba1984 Jan 17 '17 at 19:27
  • Hi Bill. After studying your answers in detail I just understood why your solution was the correct one. Thank you very much!!!! – juanba1984 Jan 23 '17 at 00:12
  • I'm glad to help! – Bill Karwin Jan 23 '17 at 15:06