1

In the world I live in, building a solution that throws all the darts at a wall, and hopes some hit the bull's eye is a pretty poor solution methodology.

So, my question arises, when is using INSERT IGNORE in a production system acceptable by convention?

I have a situation that I'd like to write a TRIGGER (see #17353080) that searches for existing data, if it is not found, then it is INSERTed. A suggestion was to use INSERT IGNORE and rely in primary keying of unique fields.

So, I have a predicament:

  1. Do I use INSERT IGNORE and rely on primary keying the fields I'd like to keep unique, failing any INSERTs of repeated
  2. Do I repeatedly query the target table to check if the field values I'd like to keep unique are present, then INSERT if they are not?

Since these questions are too fuzzy to ask in this specific medium, I'd like to understand when it is acceptable by convention to use INSERT IGNORE, and I can make my own judgement as to my specific case.

Thanks,

Matt

Meta:

I take it that since conventions is an existing tag described as:

A generic tag covering any accepted method of doing things, which could include naming, spacing, coding, commenting, etc.

that questions of this type are acceptable.

Community
  • 1
  • 1
brandeded
  • 2,080
  • 6
  • 25
  • 52
  • This will however fall into their "opinion only" flagging rules... Personally, this type of function should never be included on a serious engine. Stored procedures should be used whenever there is a need to validate data being inserted and ignore that way. This just seems like lazy programming to me. I write for a telco billing software company, and if I attempted to use something like this, my head would be on the chopping block as it can lead to invisible data inaccuracies. – Trent Jun 29 '13 at 16:37
  • 1
    I believe it is never a good practice, but it can be convenient to fix errors/problems if something has gone wrong already. – Wolph Jun 29 '13 at 16:43
  • I don't agree that this question only has acceptable `opinion` driven answers, since it can be quantified (latency of queries [and the effects of that] would be one measurement). I am in a situation where I do need the least amount of data inaccuracies, so I sincerely appreciate your input. Can you speak a bit more as to why you recommend your methodology? Does it imply that a TRIGGER should call a procedure (which would block anyway)? Note that the client performing the INSERTs has effective buffering. Please post as an answer, as I think you're on a right track. – brandeded Jun 29 '13 at 16:50

2 Answers2

1

Logically, I would say that it depends what you need to do in the event of failure.

If you are just performing batch inserts and don't care if the value is already in the table - as long as it remains unique - then INSERT IGNORE in sensible. As the term suggests, you are happy to ignore anything that isn't inserted.

If you need to do something with the failed inserts, such as advising your user, then performing a separate check is necessary.

The most important question is however: "why are the values already in the table?"

Andy G
  • 19,232
  • 5
  • 47
  • 69
  • The answer to "why are the values already in the table?" I am recording (source address+destination address) pairs from network flow software. I wish to track whether the pairs have occurred previously and place a weight a given flow if they haven't occurred. It's part of an intrusion detection system I am trying to develop relying on flow data produced by [the argus software "system"](http://qosient.com/argus/). I will also be inserting a last modified time into a status table, and tracking this last modified time with a client. – brandeded Jun 29 '13 at 16:54
1

For me, the alarm bells on these "features" are their silence. In the billing world; when you receive multiple records that are "unique" to your understanding; you want to know about them. It is the case in telecommunications that a switch will seen records with 30-100 columns; of which 5 make the record unique; HOWEVER, carries do from time to time change this. The issue here is what WAS unique is no longer, and we need to know about this.

The other issue is; why in a production system is it really ever OK to just be satisfied with ignoring duplicates? At the very least; we would have an audit table that logs duplicate records for someone to review as it typically suggests some type of problem higher in the stream of data (incoming).

As for performance... With modern hardware, is this really a question? Anything doing massive updates would likely not be heavily client interactive (likely a background process); and if it is client interactive you would have a database server design setup to handle the situation - such as table partitioning, priority etc..

Further to this; the caveats of INSERT INGORED make it a horrible feature:

  • Inserting a NULL into a column with a NOT NULL constraint.
  • Inserting a row to a partitioned table, but the values you insert don't map to a partition.

I would certainly want to know if data came in NULL where it was not supposed to...

Again, this command just seems like "something good at the time" - and is what I consider Mickey Mouse programming.

Trent
  • 2,909
  • 1
  • 31
  • 46