Assuming that I have 20L records, Approach 1: Hold all 20L records in a single table. Approach 2: Make 20 tables and enter 1L into each. Which is the best method to increase performance and why, or are there any other approaches?
-
1Tables don't have performance; they just sit there. _Queries_ have performance. You can't choose the best optimization method until you know the specific queries you need to optimize. The reason you are not sure what method is best is that you haven't described the queries. – Bill Karwin Oct 08 '22 at 06:46
-
3what is 20L ? Also you need to make you own becnhmarks. it takes time, but it's the only predictable way. – Raphael PICCOLO Oct 08 '22 at 06:48
-
2Does L stand for the Indian Lakh? Then this would be just two million rows that we are talking about, which is not much. Anyway, don't design your tables according to data size. Don't make this 20 tables; this is just a horrible idea. A table with 2 million rows is not a big deal. The DBMS is designed to manage much more. You should always have appropriate indexes for your common queries to find the data quickly. In some cases it can make sense to work with table partitions. That would be like your twenty-table approach, only that your queries still deal with one table then. – Thorsten Kettner Oct 08 '22 at 07:49
-
1What's the demanding query(ies)? – The Impaler Oct 08 '22 at 15:16
-
Very common question; that always has the same answer: Do not make lots of identical tables; instead, put all the data into one table. – Rick James Oct 08 '22 at 20:06
-
"Other approaches" -- Let's see the queries; we can help with the indexes. – Rick James Oct 08 '22 at 20:10
-
20 Liters of data? – jarlh Oct 09 '22 at 13:29
3 Answers
Splitting a large table into smaller ones can give better performance -- it is called sharding when the tables are then distributed across multiple database servers -- but when you do it manually it is most definitely an antipattern.
What happens if you have 100 tables and you are looking for a row but you don't know which table has it? If you put index on the tables you'll need to do it 100 times. If somebody wants to join the data set he might need to include 100 tables in his join in some use cases. You'd need to invent your own naming conventions, document and enforce them yourself with no help from the database catalog. Backup and recovery and all the other maintenance tasks will be a nightmare....just don't do it.
Instead just break up the table by partitioning it. You get 100% of the performance improvement that you would have gotten from multiple tables but now the database is handling the details for you.

- 151
- 7
The best/easiest is to have a unique table with proper indexes. On 100K lines I had 30s / query, but with an index I got 0.03s / query.
When it doesn't fit anymore you split tables (for me it's when I got to millions of lines).
And preferably on different servers. You can then create a microservice accessing all servers and returning data to consumers like if there was only one database.
But once you do this you better not have joins, because it'll get messy replicating data on every databases.
I would stick to the first method.

- 11,049
- 17
- 42
- 54

- 2,095
- 1
- 12
- 18
-
1Let's see your million-row table that needed splitting. We can probably point out how to fix that. – Rick James Oct 08 '22 at 20:07
When looking for read time performance, indexes are a great way to improve the performance. However, having indexes can slow down the write time queries.
So if you are looking for a read time performance, prefer indexes. Few things to keep in mind when creating the index
- Try to avoid null values in the index
- Cardinality of the columns matter. It's been observed that having a column with lower cardinality first gives better performance when compared to a column with higher cardinality
- Sequence of the columns in index should match your where clause. For ex. you create a index on Col A and Col B but query on Col C, your index would not be used. So formulate your indexes according to your where clauses.
When in doubt if an index was used or not, use EXPLAIN to see which index was used.
DB indexes can be a tricky subject for the beginners but imagining it as a tree traversal helps visualize the path traced when reading the data.

- 19
- 1
- 5
-
1Indexes do _not_ slow down writes nearly as much as speed up reads. – Rick James Oct 08 '22 at 20:09
-
It's true that indexes do not slow write nearly as much as speed up reads. I've seen 8x decrease in write performance when created 3 indexes over a table containing 1Million records. The indexes created were composite indexes based upon the common access patterns of the app. The insert query time increased from ~400ms to around 3-3.5sec range. This is negligible when compared to the gains in the read performance, which improved from ~150sec to 2-3sec. This increase in the write time can be ignored if the application is read heavy, but needs to be kept in mind for write heavy applications. – Shubhansh Vatsyayan Oct 10 '22 at 10:26
-
Were those inserting one row at a time? Or batch insertions? (Batching, itself, may lead to 10x speedup.) – Rick James Oct 10 '22 at 17:28
-
The application use case required mostly one row insertion at a time. There are multiple consumers which call the insert, resulting in the growth of data to 1M rows. – Shubhansh Vatsyayan Oct 14 '22 at 16:21