This may be asked before, but here's the situation anyway.
I have one big table (on MySQL using InnoDB), that is basically a huge log, no relational fancy stuff.
3 fields: Customer_ID, TimeStamp, Log_Data (which is a tinytext like 'Visited Front Webpage' or 'Logged In').
Since I'm logging the activity of clients in a webpage that receives around 10,000 users a day, that table grows pretty fast.
On a given moment, I've wanted to know how many clients did actually anything on the site.
So I'm running the following query 'SELECT DISTINCT Customer_ID FROM table;', and I've started noticing that as the table grows bigger the query takes longer, which is perfectly fine and totally expected. At one given time the query started taking more than 5 minutes to complete.
I wanted to find a faster way, so I tried this. Let's say that I'm working with a table with 1 million rows. I've started by splitting that table into 10 tables, 100K records each. Then I run 'SELECT DISTINCT Customer_ID FROM table;' on each table, and with all the results I just 'sort | uniq | wc' them on a command line and arrive at the same result.
Surprisingly, that method took less than half the time than the other to execute.
I've pretty much answered the question myself, 10*100K tables is faster than 1*1M table, BUT maybe I'm doing something wrong, maybe is more a problem of a performance tuning or something because tables should be designed to perform well no matter their size.
Let me know what you think.
Thanks for reading.
UPDATE: Here's how I create my table:
CREATE TABLE `mydb`.`mytable` (
`Customer_ID` BIGINT( 20 ) UNSIGNED NOT NULL,
`unix_time` INT( 10 ) UNSIGNED NOT NULL,
`data` TINYTEXT NOT NULL,
KEY `fb_uid` ( `fb_uid` )
) ENGINE = INNODB DEFAULT CHARSET = utf8;