0

I import character strings from JSON files to temporary SAS tables (SAS 9.4M7) and then write them to a permanent MySQL table (MySQL Community Server on Linux, 8.0.12). The SAS tables are set as UTF-8, the MySQL table has "utf8_general_ci" as table collation (same on column level for the relevant columns). The import to SAS works without any problems. However when I execute the SAS code that writes to the MySQL table certain lines result in an error: "Execute error: Invalid utf8 character string:"

The character that seems to cause this error is displayed as the replacement character in SAS (black rhombus with question mark). One instance is a euro sign (€) which is encoded as "\u20ac*" in the JSON file. The root cause might be located at the data source; however I require the interface from SAS to MySQL to be reliable, i. e. everything that is successfully read into SAS must also be written to MySQL without errors. So the question is: why does SAS accept a certain set of characters while in UTF-8 but MySQL doesn't?

EDIT: The SAS session also runs in UTF-8 as verified by PROC OPTIONS. One character that seems to cause the problem is this one: https://www.fileformat.info/info/unicode/char/100004/index.htm

mango
  • 17
  • 1
  • 6
  • 1
    Make sure your SAS session is using UTF-8 encoding (not just the encoding of the dataset). What about the connection between SAS and MySQL? What driver are you using? Does it support UTF-8? – Tom Jul 14 '21 at 12:11
  • Please show your code plus what this returns: `proc options option=encoding; run;` – Joe Jul 14 '21 at 15:00
  • Yes, the SAS session also uses UTF-8. I had a problem with a variety of the Euro sign but could fix it by deleting all non-UTF-8 signs from the input table. However there are some strange signs left that SAS accepts, considers valid UTF-8 but still won't load into mySQL. One being this one: https://www.fileformat.info/info/unicode/char/100004/index.htm – mango Jul 14 '21 at 19:08
  • See "black diamond" in https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Oct 28 '21 at 23:42

1 Answers1

0

I had the problem when the database fields where not declared as UTF-8 fields.

Are you using NVARCHAR?

Problem is that the Euro sign has exactly the same value like a "forbidden" character in a different character set. (I think Latin1 or so)

Try to mask the Euro sign with a different character, just for testing.

J0eBl4ck
  • 415
  • 1
  • 4
  • 10
  • MySQL treats `NVARCHAR` as `VARCHAR`. The black diamond sounds like a different problem than the "A0" to which you refer. – Rick James Oct 28 '21 at 23:44