1

I'm using SQL Server and I have a table related N-1 to another table and I have a process that updates the information of the N table by removing all the related rows and inserting them again.

I could do a search in the table and check if the row I'm going to insert already exists, but that will require to compare every column and its a lot easier to me just to deleted them all and inserting them back.

This causes the primary key to grow and grow and I'm concerned about the performance in the future.

So, the question is, will this cause a reduction of the queries in the future and it's a lot better to update the rows or this is not a problem at all?

Thanks

EDIT: I'm not deleting all the rows every time, I'm just deleting the related rows, so if I have 3 new rows to insert, I search the current ones, delete them and insert the new ones

lpaloub
  • 868
  • 2
  • 8
  • 21
  • Which DBMS are you using? – Alma Do Aug 29 '13 at 10:34
  • SQL Server, just edited the question – lpaloub Aug 29 '13 at 10:36
  • What is the table structure and queries? Sometimes an `UPDATE` gets carried out as a `DELETE ... INSERT` rather than in place anyway. – Martin Smith Aug 29 '13 at 10:41
  • There is an ad table and a prices table. An ad contain several prices, so the prices table is got a foreign key to the ad table. When I update an ad, I delete all the current prices and insert the new ones. This may make the primary key of the prices table to grow a lot and that is my biggest concern – lpaloub Aug 29 '13 at 10:50
  • 1
    do you have maintenance plans on the database? do you regenerate indexes? if so, the PK grow will not affect performance, since the number of rows will be the same. – Luis LL Aug 29 '13 at 10:50
  • Thanks Luis, that is what I wanted to know – lpaloub Aug 29 '13 at 10:53
  • More than the primary key, by following this pattern later you will have a prefomance issue on quering the table if not set to be indexed. – ray Aug 29 '13 at 10:59
  • As you are planning to compare every column to identify the row to update, I understand from you that you do not have a primary key on your table? – Philippe Grondier Aug 29 '13 at 12:16
  • I do have a primary key, but its an autoincrement one and every column makes the field unique, so I should compare every one to tell whatever the field is the same or not – lpaloub Aug 29 '13 at 12:42

3 Answers3

1

When it comes to SQL Servers internal DML primitives, UPDATE is the cheapest DML to perform. DELETE is 2nd. INSERT is the most expensive one.

A combination of insert and delete is never going to be faster. At best, SQL Server executes the UPDATE as INSERT+DELETE internally. In that case you might end up equal. If the update is processed natively, it will be faster though. How much faster or slower depends on the table structure and data, as well as on the query plan.

That performance loss might be ok, though. You can trade productivity for performance.

usr
  • 168,620
  • 35
  • 240
  • 369
  • So, the primary key growing instead of keeping it "lower" won't cause a reduction in the performance in the future? I'm not concerned about the speed of the process, so how is done is not a problem, is how the database stays in the future what matters. – lpaloub Aug 29 '13 at 10:40
  • 1
    What do you mean by the primary key growing? You mean that the number increases? That does not impact performance at all. Row count, row size, page fullness and fragmentation do. – usr Aug 29 '13 at 10:48
  • @usr . . . Can you document your first statement? It is not consistent with my experience or knowledge of how databases work. I could never make such a general statement. – Gordon Linoff Aug 29 '13 at 11:32
  • @GordonLinoff I understand your suspicion. My statement is targeted at single-B-tree DML because multi-index DML plans get to complex to make such a general statement. Here is the script that I played with to derive this information: http://pastebin.com/vSCHm5Qz All DML there targets a single CI with 1M rows that fit easily into memory. All computations are simple integer computations (I want to test the infrastructure, not the expression evaluator). – usr Aug 29 '13 at 11:36
  • @usr . . . So you understand, an update can require both an insert and a delete, both on the data pages and in the indexes, and it requires logging both the old data and the new data. It can be the least-performant alternative. – Gordon Linoff Aug 29 '13 at 11:42
  • I understand DML plans, yes. I made the answer more accurate because it was indeed misleading. I cannot imagine a case where a delete+insert would be faster. My benchmark was only targeted at the internal DML primitives. It is not particularly relevant here, I agree. – usr Aug 29 '13 at 11:49
0

As now you are deleting all the related rows. it means you are having a bunch of records that is related to another table record.

instead of deleting why don't you update it. you are having primary keys of all that records. it might be a time consuming while developing but it will reduce your processing time.

let me give you an example:

Table1

 id name
 1  XXXX
 2  YYYY
 3  ZZZZ

Table2

 pm_id  id   address
 1      1    Address1
 2      1    Address2
 3      1    Address3
 4      2    Address4
 5      2    Address5
 6      3    Address6

now if you want change in records related to id=2(Table1), you are having 2 records from Table2 i.e id=4 and id=5

so you are now able to update that records. let me know if i understood something wrong....

Aijaz Chauhan
  • 1,511
  • 3
  • 25
  • 53
  • Yes, so the idea is that I will get the id=2 with a set of addresses, and I will delete pm_id=4 and pm_id=5 and insert all the new ones. The problem is that to identify if the row was previously inserted or not, because I should compare every column and any slightly different will tell me is new. So, in my case, its probably a safer decision to do what I'm doing to ensure the consistence of the data – lpaloub Aug 29 '13 at 11:03
  • but you should update all the rows without checking whether it is changed or not. if some records are changed it will get replaced, and other records also will get replaced but with the same data as earlier. did you get my point? – Aijaz Chauhan Aug 29 '13 at 11:22
  • mmm... That's not a bad idea, but imagine that I receive 2 rows and I have 3 on the table, then I will update 2 and delete the other one no? It makes sense – lpaloub Aug 29 '13 at 12:41
0

I might have overlooked you enviroment, but if you are using SqlServer 2008 you should definitely be looking at using the Merge Command. See SQL MERGE statement to update data for a simple example. Easier to use merge than the alternatives too.

Community
  • 1
  • 1
Gary Walker
  • 8,831
  • 3
  • 19
  • 41