0

I have an issue inserting a pdf text into a mysql table. The error message is as follows: " Incorrect string value: '\xF0\x9D\x9B\xBC i...' for column 'text' at row 1"

I know that this code refers to the greek letter alpha. However, I have set 'characer set' to UTF-8 for the column text but also in the mysql connection. Also, I have tried uft8mb4. However, none of it worked.

The greek letter alpha occurs in different font types. I am not sure if this matters.

Any ideas why this does not work?

I also created a pdf file myself which contained an alpha in the text. For this example, my programme runs without any errors. Although I know that the error message refers to the alpha, there seems to be an additional issue.

Thanks in advance!

UPDATE: After some checking, I found that some really strange symbols were created from a formula which contained the greek letter alpha. So, apparently these unknown symbols led to the error. However, I still do not know how to exclude any unknown symbols from the text. What is the easiest way to do this?

These are the symbols: unknown symbols

lorenzbr
  • 161
  • 11
  • use the MySQL BLOB datatypes – Raymond Nijland Sep 07 '17 at 12:28
  • and make sure you connnect with the utf8 charset.. https://stackoverflow.com/questions/3275524/java-mysql-utf8-problem – Raymond Nijland Sep 07 '17 at 12:35
  • i do not get an error anymore. however, it looks like there is no text data in this blob-type table record. I also tried to convert it using SELECT CONVERT(text USING utf8) FROM table; but there seems to be no information stored in this record. (it is a null entry) – lorenzbr Sep 07 '17 at 12:48
  • please see my update in my original post. shouldn't the blob type take care of any kind of unknown symbols? unfortunately, this did not work for me. – lorenzbr Sep 07 '17 at 14:18
  • A PDF is binary data, incompatible with UTF-8. https://stackoverflow.com/questions/10729824/how-to-insert-blob-and-clob-files-in-mysql – Joop Eggen Sep 07 '17 at 14:33
  • blob does work if these unknown symbols are removed from the pdf. i simply have to use CONVERT to get the text information. but then i can basically use longtext as well. this is however not my problem anyways! it rather is how i automatically can remove unknown symbols from a string (either in java or mysql). see above in the update. – lorenzbr Sep 07 '17 at 14:39

2 Answers2

0

I restricted the string in Java to only latin symbols. maybe that's not the most general way of getting rid of those strange symbols but it works for now.

lorenzbr
  • 161
  • 11
0

In MySQL, use CHARACTER SET utf8mb4.

Add ?useUnicode=yes&characterEncoding=UTF-8 to the JDBC URL

Rick James
  • 135,179
  • 13
  • 127
  • 222