0

I have created a program which outputs approximately 300,000 rows of data per day to store in a database. I am a bit new to SQL however and I was wondering if it is better practice to create a new table programmatically each day with say the date as a suffix and thereby creating a new 300,000 row table per day, or to store them all in one table and insert them everyday and use a DATETIME column to handle data.

I am leaning towards the second option however I was unsure if I would reach a "cap" of rows after a period of time due to adding 300,000 rows daily. That's over 100 million rows of data a year, can MySQL databases handle that?

Aaron McKelvie
  • 167
  • 4
  • 17
  • Unless you're running MySQL on a Windows server using the Fat/Fat32 file systems, or an old Linux 2.2 I don't think you need to worry about table size. See the documentation on limits: https://dev.mysql.com/doc/refman/5.7/en/table-size-limit.html – jpw Nov 14 '14 at 00:37
  • 1
    The real question would be if you need to keep all the data online, and how you plan to query it. – jpw Nov 14 '14 at 00:37

3 Answers3

0

Your only real hard limit to the number of rows you can store in a database table is the size of it's primary key.

E.g. if your primary key is an int, it uses 4 bytes and can go to the order of 4.5 billion rows. If you use a bigint.... it's a LOT.

In my opinion it's far better to put them all in one table, just make the table as lean and performance based as possible.

E.g. if you can use data with short formats over big long dates with date time stamps do it. If you can do 4.5 billion rows easily use an int (saves 4 bytes per row).

Preferably do not have any Guids in your table, especially not as the primary key. If possible use one primary key column.

I'm sure others can chip in with more.

There are other contraints besides just the primary key size too. E.g. how much physical space you have on the drive where the database file is for one.

Ryan Mann
  • 5,178
  • 32
  • 42
0

Along the same vein as @jpw's comment - what do you want to do with it?

To put some boundaries / context around your question:

  • 300,000 rows isn't that big. I remember playing with 2,000,000 row datasets in Access '95.
  • Again, as @jpw's link pointed out, restrictions are based on filesize, not row size. 300,000 rows of a CSV is very different to 300,000 rows of binary blob data

What might help:

  • Databases are excellent at storing and retrieving similar data. eg a log table (compared to a filesystem which can store lots of files)
  • They are also great at indexing / searching / filtering / sorting

Have you tried benchmarking your results? Ie you have one day of records. Try importing the dataset 365 times and see what happens.

Christian Payne
  • 7,081
  • 5
  • 38
  • 59
0

As your application data increases you can consider sharding or horizontal scaling of the data. Sharding divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.

aarti
  • 2,815
  • 1
  • 23
  • 31