0

I am using Postgres 8.4. My system configuration is window 7 32 bit 4 gb ram and 2.5ghz.

I have a database in Postgres with 10 tables t1, t2, t3, t4, t5.....t10.

t1 has a primary key a sequence id which is a foreign key reference to all other tables.

The data is inserted in database (i.e. in all tables) apart from t1 all other tables have nearly 50,000 rows of data but t1 has one 1 row whose primary key is referenced from all other tables. Then I insert the 2nd row of data in t1 and again 50,000 rows with this new reference in other tables.

The issue is when I want to delete all the data entries that are present in other tables:

delete from t1 where column1='1'

This query takes nearly 10 min to execute.

I created indexes also and tried but the performance is not at all improving. what can be done?

I have mentioned a sample schema below

CREATE TABLE t1
(
  c1 numeric(9,0) NOT NULL,
  c2 character varying(256) NOT NULL,
  c3ver numeric(4,0) NOT NULL,
  dmlastupdatedate timestamp with time zone NOT NULL,
  CONSTRAINT t1_pkey PRIMARY KEY (c1),
  CONSTRAINT t1_c1_c2_key UNIQUE (c2)
);

CREATE TABLE t2
(
  c1 character varying(100),
  c2 character varying(100),
  c3 numeric(9,0) NOT NULL,
  c4 numeric(9,0) NOT NULL,
  tver numeric(4,0) NOT NULL,
  dmlastupdatedate timestamp with time zone NOT NULL,
  CONSTRAINT t2_pkey PRIMARY KEY (c3),
  CONSTRAINT t2_fk FOREIGN KEY (c4)
      REFERENCES t1 (c1) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT t2_c3_c4_key UNIQUE (c3, c4)
);

CREATE INDEX t2_index ON t2 USING btree (c4);

Let me know if there is anything wrong with the schema.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
user2509885
  • 45
  • 2
  • 8
  • Without the definitions for the other tables, it's impossible to tell, but it's probably due to missing indexes on e.g. `t2(t1_id)` or whatever the referencing fields are named. – Denis de Bernardy Jun 21 '13 at 18:06
  • other tables are same as t2 but with more no of columns. and for now i have removed idexes – user2509885 Jun 21 '13 at 18:17
  • 2
    What do you mean by `for now i have removed idexes`? We need to see the indexes on involved columns, they are essential. Along with *all* other constraints and triggers that might be involved. Also: your constraint names are very misleading. Did you mess them up when posting here or in your database? – Erwin Brandstetter Jun 21 '13 at 19:12
  • CREATE INDEX t2_index ON t2 USING btree (c1); yes i had change the constraint name db names column names – user2509885 Jun 21 '13 at 19:12
  • check now if you are able to understand – user2509885 Jun 21 '13 at 19:23
  • 1
    You should have just fixed the misleading constraint names. `t1`, `c1` etc. like you had it were a lot easier to read. And unquoted CaMeL case identifiers are a sign of manual tampering, because proper identifiers should be *all lower case*. (Or you have to double-quote, but don't get into that nonsense.) Also, your example contradicts the text now. I suppose you roll back and just fix the misleading names. – Erwin Brandstetter Jun 21 '13 at 19:31
  • please ask me what you are not understanding because i am not able to create a better example than this else will give you the orginal table structure if you want – user2509885 Jun 21 '13 at 19:40
  • I think I understand perfectly well, but this is a public forum where everybody should be able to understand. If it's ok with you, I'll roll back your changes and just fix the offending identifiers ... – Erwin Brandstetter Jun 21 '13 at 19:53

1 Answers1

1

With bigger tables and more than just two or three values, you need an index on the referenced column (t1.c1) as well as the referencing columns (t2.c4, ...).

But if your description is accurate, that can not be the cause of the performance problem in your scenario. Since you have only 2 distinct values in t1, there is just no use for an index. A sequential scan will be faster.

Anyway, I re-enacted what you describe in Postgres 9.1.9

CREATE TABLE t1
( c1 numeric(9,0) PRIMARY KEY,
  c2 character varying(256) NOT NULL,
  c3ver numeric(4,0) NOT NULL,
  dmlastupdatedate timestamptz NOT NULL,
  CONSTRAINT t1_uni_key UNIQUE (c2)
);

CREATE temp TABLE t2
( c1 character varying(100),
  c2 character varying(100),
  c3 numeric(9,0) PRIMARY KEY,
  c4 numeric(9,0) NOT NULL,
  tver numeric(4,0) NOT NULL,
  dmlastupdatedate timestamptz NOT NULL,
  CONSTRAINT t2_uni_key UNIQUE (c3, c4),
  CONSTRAINT t2_c4_fk FOREIGN KEY (c4)
      REFERENCES t1(c1) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE CASCADE
);

INSERT INTO t1 VALUES 
 (1,'OZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf', 234, now())
,(2,'agdsOZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf', 4564, now());

INSERT INTO t2
SELECT'shOahaZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf'
     ,'shOahaZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf'
     , g, 2, 456, now()
from generate_series (1,50000) g

INSERT INTO t2
SELECT'shOahaZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf'
     ,'shOahaZGPIGp7tgp97tßp97tß97tgP?)/GP)7gf'
     , g, 2, 789, now()
from generate_series (50001, 100000) g

ANALYZE t1;
ANALYZE t2;

EXPLAIN ANALYZE DELETE FROM t1 WHERE c1 = 1;

Total runtime: 53.745 ms

DELETE FROM t1 WHERE c1 = 1;

58 ms execution time.

Ergo, there is nothing fundamentally wrong with your schema layout.

Minor enhancements:

  • You have a couple of columns defined numeric(9,0) or numeric(4,0). Unless you have a good reason to do that, you are probably a lot better off using just integer. They are smaller and faster overall. You can always add a check constraint if you really need to enforce a maximum.

  • I also would use text instead of varchar(n)

  • And reorder columns (at table creation time). As a rule of thumb, place fixed length NOT NULL columns first. Put timestamp and integer first and numeric or text last. More here..

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • so will the changing of numeric(9,0) and numeric(4,0) to integer affect in the performance? after changing shall i keep one index and see – user2509885 Jun 21 '13 at 19:58
  • @user2509885: It certainly effects performance, but only in a minor way. The 10 min runtime you are reporting must come from something that is not in your question. – Erwin Brandstetter Jun 21 '13 at 20:02
  • ok one more thing i wanted to ask am i creating the indexes correctly? and can we add indexes after the table is created that is before we start to dump data – user2509885 Jun 21 '13 at 20:05
  • 1
    @user2509885: It's actually *best* to create indexes after bulk inserts, because that is a lot faster. You can create and delete them any time. You can also do that for primary keys, foreign key and unique constraints. – Erwin Brandstetter Jun 21 '13 at 20:08
  • @user2509885: I added some more minor hints. – Erwin Brandstetter Jun 21 '13 at 20:34
  • indexes solved the problem now the performance has increased a lot it hardly takes seconds to delete .but there is another issue that has come .now ingestion of data is taking more time because of the indexes is there anything that could be done because in ingestion there are approximately 1 to 2 lakhs rows getting updated in the entire database including all tables – user2509885 Jun 21 '13 at 20:36
  • ingestion earlier took hardly 8 seconds .Acutally data ingestion is done from a java program using jdbc connection – user2509885 Jun 21 '13 at 20:40
  • @user2509885: If you are inserting a substantial part of the table, it's faster to drop all indexes and constraints and recreate them afterwards as I already mentioned. Also, it's *much* faster to do a bulk INSERT with COPY. [More in this related answer.](http://stackoverflow.com/questions/9423539/optimizing-performance-of-postgresql-database-writes-in-django/9423888#9423888) You probably don't need the mentioned temp. staging table. There are many related answers on SO. – Erwin Brandstetter Jun 21 '13 at 20:46
  • what i do at present is create a batch of prepared statements and execute at regular intervals in prepared statement i use insert will copy work there ? and since this is a web application i cannont drop all indexes and recreate all the time because one user might go for deletion at the same time other will go for updatation – user2509885 Jun 21 '13 at 20:57
  • @user2509885: [About `COPY`.](http://stackoverflow.com/questions/16991383/postgres-copy-from-variable-with-csv-data/16991854#16991854) [About populating a database.](http://www.postgresql.org/docs/current/interactive/populate.html) – Erwin Brandstetter Jun 22 '13 at 03:17
  • the mistake i made when creating indexes is i created mulitple indexes in a table because of which insertion took a lot time .now i have analysed the delete command more deeply and created the index now both insertion and deletion performance has increased a lot . I love postgres.. thanks a lot – user2509885 Jun 22 '13 at 09:16