I have a table with 100 million rows and this is only going to continue to grow.
In order to speed up select
queries, I want to partition this table, what is the best approach to doing this? I have never partitioned a table before and I'm not sure where to even begin.
I do have indexes on the columns most often used for where
clauses, group by
, or order by
.
Edit Part 1: Right now, my select queries take approximately 55-60 seconds each, with indexes.
Edit Part 2: My table is not normalized up to 3NF. It's currently one giant table in a 1NF form. The table has approximately 13 columns with 100 million + rows.
I only have indexes on columns that I use to filter my select queries on.
The cardinality of the columns that are indexed is pretty high with many different values (for example, email, employee number, dates, etc.)
Edit Part 3: I looked at the link that Mark posted and it is a very good reference, I just have a few questions about the function that they used.
CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.logdate >= DATE '2006-02-01' AND
NEW.logdate < DATE '2006-03-01' ) THEN
INSERT INTO measurement_y2006m02 VALUES (NEW.*);
ELSIF ( NEW.logdate >= DATE '2006-03-01' AND
NEW.logdate < DATE '2006-04-01' ) THEN
INSERT INTO measurement_y2006m03 VALUES (NEW.*);
...
ELSIF ( NEW.logdate >= DATE '2008-01-01' AND
NEW.logdate < DATE '2008-02-01' ) THEN
INSERT INTO measurement_y2008m01 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
What exactly does $$ mean? I have never created a function before using SQL and am new to programming in general.
When you are using the NEW.*
part, NEW
refers to the new values being entered in? It's not a particular table or anything, it's just new rows being entered. You use "NEW" to represent the new rows, correct? I have been experimenting with this and that's the conclusion that I came to, but I just wanted to see what others thought.
Edit Part 4: $$ is apparently used to indicate when you are starting and ending a function. It's also used to write string constants. I learned this from What are '$$' used for in PL/pgSQL