1

I'm trying to unique row based on multiple columns combine and without using the index/key to do insert update duplicate row. What I need is like INSERT ... ON DUPLICATE KEY, but I can't add index due to the index size limit.

Here is my table design

Table : keyword_content

+--------------------+---------------------+------+-----+---------+----------------+
| Field              | Type                | Null | Key | Default | Extra          |
+--------------------+---------------------+------+-----+---------+----------------+
| id                 | int(10) unsigned    | NO   | PRI | NULL    | auto_increment |
| group_id           | int(10) unsigned    | NO   |     | NULL    |                |
| encoding           | tinyint(3) unsigned | NO   |     | 0       |                |
| content            | text                | NO   |     | NULL    |                |
| active_period      | datetime            | YES  |     | NULL    |                |
| status             | tinyint(3) unsigned | NO   |     | 1       |                |
| flag               | tinyint(1)          | YES  |     | 1       |                |
| description        | text                | YES  |     | NULL    |                |
| updated_by         | varchar(255)        | YES  |     | NULL    |                |
| updated_at         | datetime            | YES  |     | NULL    |                |
+--------------------+---------------------+------+-----+---------+----------------+

Columns use for unique row are:

  • group_id
  • content
  • status

I've also tried to handle the process in PHP by separate out what to insert/update, but it cause performance issue due to quantities of data import from spreadsheet.

Deno
  • 434
  • 4
  • 16
  • 1
    Possible duplicate of ["INSERT IGNORE" vs "INSERT ... ON DUPLICATE KEY UPDATE"](https://stackoverflow.com/questions/548541/insert-ignore-vs-insert-on-duplicate-key-update) – Farrukh Sarmad Feb 02 '18 at 07:49
  • I can't add any key to make it unique due to the index size limit. Anyway, thank for your comment. – Deno Feb 02 '18 at 08:07

1 Answers1

1

From what I understand, you cannot add a combined index on the 3 columns because that will get too big because of the content.

As you really need an index both for efficient lookup and your inserts, I would recommend adding a column to the table where you store a hash of the combination of the 3 columns and add a unique index on that.

Obviously any hash has a theoretical risk of collisions, but using a strong hash that is unlikely to be a problem.

jeroen
  • 91,079
  • 21
  • 114
  • 132
  • I've tried to add a column with a unique index to store 3 columns in hash format (md5,gzdeflate). But I found that md5 have problem while hashing unicode content, and gzdeflate had problem to measure the size, I'm worried the size exceeds the limit. Do you have any hash format recommended? – Deno Feb 02 '18 at 08:38
  • 1
    @Deno I would definitely not use `md5` - that really has a high risk of collisions - but something like `sha256` instead. Using `hash()` you should be able to hash any string. See for example https://3v4l.org/h3ue5, that's about as unicode you can get :-) – jeroen Feb 02 '18 at 08:43