1

I want to fill a table with data that might be overlapping. I do this a few times in the code. so, the first time it is

INSERT INTO A (SELECT * FROM B)

and then the second time it is

INSERT INTO A
SELECT * FROM C
WHERE NOT EXISTS (SELECT * FROM A Where A.field = C.field)

the re-insertion is done a total of 3 times. I could change the code to just one insertion using a UNION on the 4 sub-tables, but it will require alot of re-design, so I wanted to know how much performance penalty I am receving right now

Thanks!!

knittl
  • 246,190
  • 53
  • 318
  • 364
n00b programmer
  • 2,671
  • 7
  • 41
  • 56
  • http://stackoverflow.com/questions/7077172/if-mysql-db-table-does-not-contain-row-with-specific-id-add-data-to-the-table/7077314 – nobody Aug 17 '11 at 08:40

2 Answers2

1

A union is probably less costly. But as usual with databases, it depends.

Why?

Because what you are doing right now is this:

  1. Scan table B and insert into A.
  2. Scan table C and insert into A (where not exists).
  3. Scan table D and insert into A (where not exists).

With a union you would be doing this:

  1. Scan table B.
  2. Scan table C.
  3. Scan table D.
  4. Insert unique values to table A.

I.e. your current queries scans table B, C, D and table A twice plus overhead for three separate queries. A union-query would scan table B, C, D, and sort the rows (to get unique values) and then insert them into table A. At a first glance it seems that union would be faster because you are doing two less scans and only one insert (and thus less locking).

What I mean by it depends is:

Indexes: Correctly indexed, seeks might be faster than sorting the data from B, C and D.

Amount of data: If you have little data compared to the hardware you are running the database on then this discussion might be pointless. I.e. there's no point in optimizing parts that are not bottlenecks.

Andreas Ågren
  • 3,879
  • 24
  • 33
0

You might want to look at INSERT IGNORE ... also if you have a suitable UNIQUE KEY constraint in the destination table.

Just another idea... No idea about the performance differences though, probably depends on your table structures, indexes and data. I would just benchmark the different methods myself and choose most suitable one.

snap
  • 2,751
  • 22
  • 33