MySQL Performance: Single table or multiple tables for large datasets

Question

I am building an app to support 200,000+ registered users, and want to add an addressbook functionality for each user to import their own contacts (e.g. name, address, email, etc). Each user will have c.150 different contacts, with 10-15 fields for each record.

My question is simple: given the volume of users and the number of contacts for each user, is it better to create individual tables for each user's addressbook, or one single table with a user_id lookup for that associated user account?

If you could explain why from a performance perspective, that would be much appreciated.

UPDATE: Specifications

In response to questions in comments, here are the specifications: I will be hosting the database on AWS RDS (http://aws.amazon.com/rds). It will primarily be a heavy read load, rather than write. When write is accessed, it will be a balance between INSERT and UPDATE, with few deletes. Imagine the number of times you view vs edit your own addressbook.

Thanks

Having one table is much better because it is easier to retrieve and maintain. you can partition your table later though. several million records is not that much for mysql — Sam, Dec 04 '13 at 18:46
Need more information then this... what MySQL version? what storage engine? table description would be nice.. will it be an heavy write or heavy read application or both? and what performance perspective is this based on INSERTS, UPDATES or DELETES?? — Raymond Nijland, Dec 04 '13 at 18:49
Here's a simple question that you need to ask yourself: are you using a mechanical HDD or an SSD? Difference is *huge*. But, for a simple answer - one table is easier to manage. It's also a violation of many (logical even) rules to have a table per user. It basically makes 0 sense. It's like shooting yourself in the foot 5 minutes before a big race and asking whether that was a good move or not. — N.B., Dec 04 '13 at 19:00

bishop · Accepted Answer · 2013-12-04T20:09:11.877

1

Specific answer in response to specifications One table for contacts' data, with an indexed foreign key column back to user. Finding a particular user's contacts will require about 3 seeks, a relatively small number. Use a SSD if seeks are bottlenecking you.

If your 15 columns have 100 bytes each, and your have 150 of those, then your maximum data transfer per user is of the order 256k. I would design the application to show only the contact data required up front (say the top 3 most useful contact points -- name, email, phone), then to pull more specifics when requested for particular contacts. In the (presumably) rare cases when you need all contacts' info (eg export to CSV) consider SELECT INTO OUTFILE if you have that access. vCard output would be less performant: you'd need to get all the data, then stuff into the right format. If you need vCard often, consider writing vCard out when database is updated (caching approach).

If performance requirements are still not met, consider partitioning on the user id.

General answer

Design your schema around KISS and your performance requirements, while documenting the scalability plan.

In this particular situation, the volume of data does not strike me as being extreme, so I would lean KISS toward one table. However, it's not clear to me the kind of queries you will be making -- JOIN is the usual performance hog, not a straight SELECT. Also what's not clear to me is your SELECT/UPDATE mix. If read-heavy and by user, a single table will do it.

Anyway, if after implementation you find the performance requirements aren't met, I would suggest you consider scaling by faster hardware, different engine (eg MyISAM vs. InnoDB -- know what the differences are for your particular MySQL version!), materialized views, or partitioning (eg around the first letter of the corresponding username -- presuming you have one).

edited Dec 04 '13 at 20:09

answered Dec 04 '13 at 18:51

bishop

37,830
11
104
139

You should really read this "Compared to MyISAM, InnoDB delivered 35x higher throughput on the Read / Write test and 5x higher throughput on the Read-Only test, with 90% scalability across 36 CPU cores. "... http://www.oracle.com/partners/en/knowledge-zone/mysql-5-5-innodb-myisam-522945.pdf and https://blogs.oracle.com/MySQL/entry/comparing_innodb_to_myisam_performance – Raymond Nijland Dec 04 '13 at 18:53
"Faster" is a loaded word. Whether it's faster in your case or not depends on what you're doing. See for example [this post on MySQL performance blog](http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-innodb/). (And I don't think a -1 is fair, read that post for the pros and cons -- I'm not saying in general MyISAM is faster: I caveated with "I don't have all the facts".) – bishop Dec 04 '13 at 18:55
InnoDB is far superior to MyISAM in several ways watch this http://vimeo.com/20990641 – Sam Dec 04 '13 at 18:55
1

-1 is for answer not being a good answer. MyISAM / InnoDB differ in so many things, and since you're reading percona blog then I'm sure you are familiar with the differences. There is virtually no reason in this day and age why one would use MyISAM, which has really given its best so far. Suggesting an outdated engine over an engine that's really, really well made is a bad suggestion and someone will probably google your answer and think that MyISAM is indeed faster or a better choice. Edit: removed the downvote. – N.B. Dec 04 '13 at 18:57
@N.B. Understood and point taken. My word choice in my original answer was wrong. I removed "faster" in my answer. That's the wrong word. I replaced with "different", as MyISAM /might/ be a /good/ choice in this /particular/ situation. – bishop Dec 04 '13 at 18:58
MyISAM will table lock on every insert/update... not good for this question.. InnoDB is really the best option in this case and yes there are cases when you may want to use MyISAM (cant think one right now...) – Raymond Nijland Dec 04 '13 at 18:58
2

@RaymondNijland: I have run into [this situation](http://stackoverflow.com/questions/19267507/how-to-optimize-count-performance-on-innodb-by-using-index), and I use MyISAM to overcome it. – bishop Dec 04 '13 at 19:02
@bishop Yes InnoDB is an transaction based engine so an full table scan is needed.. MyISAM cheats by storing the record COUNT within the MyISAM files (what makes it an cheap query to answer).. but this one is an good example indeed but most likely InnoDB engine will outperform MyISAM engine when COUNT is used in combination with an WHERE with an PRIMARY or with secondary indexes – Raymond Nijland Dec 04 '13 at 19:09
@RaymondNijland: Agreed. I've read (but not tested) that 5.7 might support the WHERE and INDEX optimization for COUNT on InnoDB, but my particular case I have a further constraint: I can't upgrade past v5.5 for now. Incidentally, another example for MyISAM over InnoDB is that before v5.7, InnoDB didn't have spatial types -- I had to resort to hacking the underlying blob to get spatial queries I wanted. – bishop Dec 04 '13 at 19:52

score 0 · Answer 2 · answered Dec 04 '13 at 18:54

Have a Single table, but partition the table by the starting alphabet of the user like all Last Names starting with A will be loaded into 1 partition. All names starting with B will be loaded into another partition.

You could also do some amount of profiling to find the right distribution key.

score 0 · Answer 3 · answered Dec 04 '13 at 20:23

I'm not a DBA, but I suggest you properly normalize the database, add indexes, etc and not bugger it up to meet a possible nonexistent performance issue. If possible, have a DBA review your schema. I don't think 20,000 users is excessive. All 200,000 users are not likely to hit the update button in the same x milliseconds it takes to process one person's input. Only a few will be logged in at any time and most of them will be filling out data or staring at existing data on the web page rather than hitting that update button. If by chance a bunch of them do hit it at the same time, there will probably be a performance wait rather than a crash. Here is a rough layout for your schema (mileage may vary):

User
long userID primary key
String firstName
String lastName

Contact
long contactID primary key
long userID foreign key
String firstName
String lastName

Address
long addressID primary key
long contactID foreign key

MySQL Performance: Single table or multiple tables for large datasets

3 Answers3