0

I'm trying to import data into my table. The source is CSV in UTF8 encoding into my MySQL UTF8MB4 table.

Originally i thought the encoding was wrong for the language which is Russian (for this failed row) but it turns out its a slash "/" in the string (or any other similar chars) and when i remove it then insert works to that point.

This data is multi language and has emoji too, what is the best way to handle the slash in the double quote enclosed string?

For example the slash in this is the problem "й\с"

When i run

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'

I get

Variable_name Value
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8mb4
character_set_server latin1
character_set_system utf8
collation_connection utf8mb4_general_ci
collation_database utf8mb4_unicode_ci
collation_server latin1_swedish_ci
Luuk
  • 12,245
  • 5
  • 22
  • 33
user15793580
  • 1
  • 1
  • 2
  • a slash `/` is something different than a back-slash `\\`, see: [Difference between forward slash (/) and backslash](https://stackoverflow.com/questions/38428561/difference-between-forward-slash-and-backslash-in-file-path) – Luuk May 07 '21 at 17:03
  • The back-slash in your example act as an escape character, and is not needed in your example. One explanation for a back-slash is here: https://stackoverflow.com/questions/12091506/what-is-the-backslash-character – Luuk May 07 '21 at 17:08
  • So i'm using a dataset which has / inside text strings as people type whatever they want. So you are saying the / is escaping the UTF8MB4 encoding? is there no way to handle this? i assumed as in show variables above it had UTF8 for character_set_system? – user15793580 May 07 '21 at 17:28
  • Actually the error only happens if i have the text after the slash like below: China\Принимающий If i put a space before Принимающий it works.. so the problem only happens with the \ and Russian letters beside each other with no space. – user15793580 May 07 '21 at 17:33
  • 1) What is the character set *of the tables*? 2) What about that [character_set_server](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_character_set_server) = latin1? – Schwern May 07 '21 at 18:56
  • 3) What is the error you're getting? 4) Are you sure your CSV library is handling the \ correctly? There's nothing special to utf8mb4 about a \. [Demonstration](https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=649bab70b5fdf345f4316631770bc631). – Schwern May 07 '21 at 19:00
  • Thanks for the help schwern, I'm really stuck at this and trying ti fix asap. Your example had two slashes. If the text has one slash it gets removed when i query the record, is there a way to make it inserted with 2 slashes? – user15793580 May 08 '21 at 09:22
  • Also Schwern 1. They are utf8mb, 2. Character_set_server i want to be uft8mb but it's in aws Aurora so I'm not sure how to set it. I'm not sure either if this is the problem. 3 error is illegal uft8mb char just before the slash but when i put a space between the slash and the Russian text it works fine.. I've no idea what this means or how it should be handled.4. i have a CSV from a data vendor who are not very helpful so I'm not sure. – user15793580 May 08 '21 at 09:26
  • I added ESCAPED BY '' (two single quotes) to the import, randomly and it worked on the row and inserted it correctly, any idea why this is? – user15793580 May 08 '21 at 14:14
  • @user15793580 The example has two slashes because backslash is the escape character. `'\''` is one quote. `'\n'` is a newline. ```'\\'``` is a single backslash. Normally this is not necessary because you pass values to the SQL server like you would to a function. But if you're building the query using string concatenation this leads to your escaping problem. Also its insecure. See [SQL Injection](https://www.w3schools.com/sql/sql_injection.asp). The problem is not with MySQL, but your code which is doing the insert. Can we see that code, please? – Schwern May 08 '21 at 19:02

0 Answers0