2

I've built a simple hit counter on my website (PHP & MySQL, using Codeigniter as my framework).

This is the table I use:

CREATE TABLE page_hits (id INT NOT NULL AUTO_INCREMENT, page_url VARCHAR(350) NOT NULL, ip VARCHAR(11) NOT NULL, hits INT NOT NULL, `date` TIMESTAMP DEFAULT CURRENT_TIMESTAMP  NOT NULL, PRIMARY KEY (id));

On every page load, I check if the combination of page_url & ip exists in the table. If it does, I increment the value of hits by 1. If not, I create a new row in the table. The timestamp is there to allow a certain delay between hit counts, so as not to count a page refresh as a new hit.

It all works nicely, but I'm afraid I might be overloading my database... In less than 24 hours, I have over 6500 lines in the page_hits table.

So my question is: What are the risks of having such a rapidly growing table in my database? (performance issues? exceeding database size limitation?)

Jonathan Eustace
  • 2,469
  • 12
  • 31
  • 54
einav
  • 553
  • 9
  • 27

1 Answers1

1

Let me start by rewriting your single line SQL command:

CREATE TABLE page_hits 
(id       INT NOT NULL AUTO_INCREMENT, 
 page_url VARCHAR(350) NOT NULL, 
 ip       VARCHAR(11) NOT NULL, 
 hits     INT NOT NULL, 
 date     TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL, 
 PRIMARY KEY (id))

Now I can see what's there.

Your table is not complex, but it will grow quickly. This will not be a problem, as long as you don't do anything with it. In other words: Adding rows to the table is not a problem, even if you have a million rows.

However as soon as you start to query this table you will find it gets slow very quickly. You've forgotten to add indexes.

How do I add indices to MySQL tables?

Secondly, you could think about normalizing your table and get rid of unneeded info. For instance these three smaller tables:

CREATE TABLE page_hits 
    (id         INT NOT NULL AUTO_INCREMENT, 
     page_id    INT NOT NULL, 
     client_id  INT NOT NULL, 
     hits       INT NOT NULL, 
     PRIMARY KEY (id))

CREATE TABLE pages 
    (id       INT NOT NULL AUTO_INCREMENT, 
     page_url VARCHAR(350) NOT NULL, 
     PRIMARY KEY (id))

CREATE TABLE clients 
    (id       INT NOT NULL AUTO_INCREMENT, 
     ip       VARCHAR(11) NOT NULL, 
     date     TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL, 
     PRIMARY KEY (id))

Here page_id refers to the pages table, and client_id refers to the clients table. You will have to add the indexes yourself. I would get rid of the date column, and solve this problem with a cookie. Note that the new tables can easily be expanded to contain more info without getting too big too quickly.

Community
  • 1
  • 1
KIKO Software
  • 15,283
  • 3
  • 18
  • 33
  • Thanks! and sorry for the poor layout of the SQL in my original post. – einav Mar 16 '15 at 14:44
  • It got downvoted. If it stays that way I will remove this answer. I would be helpful if I knew what I have done wrong, so I can improve myself. – KIKO Software Mar 16 '15 at 14:47
  • I now added indexes on `page_url` and on `ip`, as those are the fields I'll be querying. But I didn't understand what's the benefit of using three separate tables instead of the one I'm using. Will this normalization improve performance? – einav Mar 16 '15 at 14:47
  • And I dont know who downvoted you and why. I just upvoted your answer. – einav Mar 16 '15 at 14:49
  • Yes, very weird. It happens a lot, downvotes without any feedback. It's not nice. Thanks for the upvote. – KIKO Software Mar 16 '15 at 14:54
  • 1
    As to your question: Think about it, every hit by anybody will create a new row in the `page_hits` table. By removing `page_url` and `ip` from that table I've made each row a lot smaller. And the other two tables (`pages` and `clients`) won't grow as fast because you reuse the rows. This is the essence of normalisation. – KIKO Software Mar 16 '15 at 14:57
  • True. Every row in `page_hits` will now be smaller. But with every page load I will have to query 2 tables (`clients` & `pages`), and add a new row (or update an existing row) to the third table (`page_hits`). Also, `page_hits` will use 2 foreign keys. Wouldn't that burden the database? – einav Mar 16 '15 at 15:18
  • You must analyse how your data will be used, and structure your database to suit. Normalisation is appropriate in many cases, but not all. – rjdown Mar 16 '15 at 15:27
  • I must agree with rjdown, it all depends on what you do. Yes, in this case you have to do more processing, but your tables will stay smaller. On a high traffic server that is the way to go. If you have few visitors, and a slow server, your own single table is more efficient. – KIKO Software Mar 16 '15 at 15:30
  • How wold you define "few visitors" vs "high traffic"? My original `page_hits` table is now growing by around 7000 lines every 24 hours. – einav Mar 16 '15 at 15:38
  • And one last thing, @KIKOSoftware If I were to use your proposed 3 tables structure, which columns would you suggest I should index? And should I define `page_id` an `client_id` as foreign indexes? – einav Mar 16 '15 at 16:27
  • 2
    Columns which you use in the WHERE section of SELECT commands certainly need to be indexed. 7000 rows per day, is 2,5 million per year. If you collect data for several years then I would think that's a bit much. MySQL can easily handle 10 million rows, but if you look at the database filesize you might see why it could become a problem. Suppose your row is 400 bytes, multiply this by 10 million, that's 4 GB. Too big, in my opinion. My row will be about 1/20th of that size. This all without any indexes, which will make the filesize only bigger. – KIKO Software Mar 16 '15 at 19:19