1

In a MySQL database, I have one table that has 330 columns, each column is either a float or integer value. The entries are indexed by a millisecond time stamp column. Over the life of the application there is expected to be on the order of 100 - 200 million entries. This table is completely independent and has no relations to other tables. The only queries are ones filtering by the time stamp index.

Assuming I have a modern intel server with 6 cores and 32GB of ram and enough disk storage for the data, will any size limits be hit or will performance significantly degrade?

If there will be problems, what should be done to mitigate the problems.

I know similar questions have been asked, but the answer always seems to be it depends. Hopefully I've provided enough information so that a definitive answer can be determined.

Thor
  • 600
  • 1
  • 6
  • 17
  • Well, it depends... It really does. Without understanding the full call pattern against that table, it would be impossible to give definitive information on recommended hardware specs. Also even though the table is independent, if there are other tables in the DB or other databases on the server, you need to understand what is happening with them as well. That being said, from practical experience hundreds of millions of rows should not be a problem in a proper indexed database, with reasonable hardware and MySQL tuning. Of course 330 columns is a little crazy, but not too bad if no indexes. – Mike Brant Nov 21 '13 at 21:22
  • "It depends." :) The database sits behind a web server and it is the only client. There are a couple of other tables, but they are extremely small ( <5 columns and <10 entries); just state data. The other tables are rarely read or updated. This really is just a simple application holding a lot of data entries. – Thor Nov 21 '13 at 21:30
  • See this section of MySQL documentation: http://dev.mysql.com/doc/refman/5.5/en/column-count-limit.html – Barmar Nov 21 '13 at 21:48
  • I'm more worried about whether you've designed the database properly. 330 columns suggests a possible lack of normalization. – Barmar Nov 21 '13 at 21:48
  • Forgive my ignorance, what is meant by normalization in this case? Each column is value for a specific characteristic (Temperature, voltage, on/off, etc) captured by various sensors at a specific time. – Thor Nov 21 '13 at 22:28
  • How can this question be "narrowed" to allow for better answers? What other details are needed? – Thor Nov 21 '13 at 22:32

1 Answers1

1

Wow such huge hardware for such a small dataset!

You will not have any insurmountable problems with this dataset.

330 columns * 8 bytes = 2640 bytes (maximum) per row 2640 bytes * 200 million rows = 491GB

It's big, but not huge. It really depends what you're going to do with the data. If you are 'appending' to the data, never updating or inserting (in your case inserting earlier timestamps) then that eliminates two potential causes for concern.

If you are querying on the timestamp index, are you going to be using a RANGE or a specific timestamp?

Querying over ranges will be fine - make your timestamp the clustered index column. Since you are performing some earlier inserts, this can cause your table to fragment but that won't be a really big problem - if it does you can defragment the table.

A big choice for performance is InnoDB or MyISAM - InnoDB has full transactional capability - but do you need this? It involves twice as many writes, but for a single table with no referential integrity, you're probably going to be ok - more here.

Community
  • 1
  • 1
Dave Hilditch
  • 5,299
  • 4
  • 27
  • 35