0

I have a table namely "tags". The structure of which is as follows:

CREATE TABLE `tags` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) COLLATE utf8_bin NOT NULL,
  `value` varchar(255) COLLATE utf8_bin NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  UNIQUE KEY `tags_name_value_unique` (`name`,`value`),
  KEY `tags_name_index` (`name`),
  KEY `tags_value_index` (`value`)
  ) ENGINE=InnoDB AUTO_INCREMENT=1380 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Getting the similar errors for different values. The error is frequent but random

SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'screen_name-Erika S' for key 'tags_name_value_unique' (SQL: insert into tags (name, value, updated_at, created_at) values (screen_name, Erika Söll, 2018-08-29 08:21:55, 2018-08-29 08:21:55))

In this case, the value Erika Söll was not present in the DB whereas the value Erika S was available in DB. I have the following questions:

  • The insertion is done with value Erika Söll so why the unique constraint giving error for Erika S?
  • There are thousands of such errors and after analyzing them it seems that the value is matched before an occurrence of any German character like ö, ü, ß, etc. Why is the indexes behaving like this?
  • Is this something to do with using both the unique constraint and index together? In this case the index was created first and unique constraint was added later
  • How can this be fixed?

Queries for selecting and inserting values are as follows:

select * fromtagswhere (name= 'screen_name' andvalue= 'Klaudia-Krauß') limit 1

insert intotags(name,value,updated_at,created_at) values ('screen_name', 'Klaudia-Krauß', '2018-09-03 04:08:46', '2018-09-03 04:08:46')

Rick James
  • 135,179
  • 13
  • 127
  • 222
developer34
  • 123
  • 1
  • 1
  • 10
  • please show how your are executing the query. Might have something to do with charset conversion at that point – e4c5 Aug 31 '18 at 14:37

2 Answers2

1

When inserting into the table, "Erika Söll" was truncated to "Erika S", that is, just before the first accented letter. This is a character set issue. More discussion here, to whit:

  • The bytes to be stored are not encoded as utf8. Fix this.
  • Also, check that the connection during reading is utf8.

Without knowing whether you are using PHP, Java, Python, etc, I can't be more specific. However, there may be tips here .

Meanwhile, probably you don't need id at all, and the 4 indexes can be simplified to two:

PRIMARY KEY(`name`,`value`),
KEY `tags_value_index` (`value`)
Rick James
  • 135,179
  • 13
  • 127
  • 222
0

I was facing the issue with the combination of PHP and MySQL. The issue was related to the data encoding. The data I was receiving was a mix of UTF8 and other encoding i.e., ISO-8859-1 in my case. So as a fix, I wrote the following code which solved my problem

public static function convertToUTF8($input)
{
    return !mb_check_encoding($input, 'UTF-8') ? utf8_encode($input) : $input;
}

The mb_check_encoding($input, 'UTF-8') check is to avoid encoding strings that are already UTF8 encoded.

developer34
  • 123
  • 1
  • 1
  • 10