How to normalize whitespace for user-generated data imports in MySQL/PHP

Question

I am building a web app that allows users to import CSV files into the database. It parses those uploads into string values. I had a case where I uploaded two different files that had the string "New Hartford". When I ran a SELECT statement to get all instances of this string, I noticed that what appears to be an identical string value results in pulling up different rows in the query results:

SELECT * from divisions WHERE name = 'New Hartford'; 

SELECT * from divisions WHERE name = 'New Hartford';

Upon further investigation I found that it was the whitespace character between "New" and "Hartford". Apparently there is something different about one of them. I assume it is a character encoding issue, a control character, or perhaps a different form of a whitespace character. Two questions: [1] How can I detect these variations either before or after I insert the data? [2] How can I normalize white-space characters efficiently and elegantly before or during insert? (I'd ideally like a quick function in the insert statement itself, I think.)

For extra context, I am using MySQL Server 8.0, with the following charsets and collations:

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
[result]
character_set_client        utf8
character_set_connection    utf8
character_set_database      utf8mb4
character_set_filesystem    binary
character_set_results       utf8
character_set_server        utf8mb4
character_set_system        utf8
collation_connection        utf8_general_ci
collation_database          utf8mb4_0900_ai_ci
collation_server            utf8mb4_0900_ai_ci

You can use a regex to replace all whitespace, see [this](https://stackoverflow.com/a/40048457/231316) — Chris Haas, Jul 02 '20 at 17:31
There are `U+0020` _Space_ and `U+00A0` _No-Break Space_ in your sample strings, respectively. — JosefZ, Jul 02 '20 at 18:59

How to normalize whitespace for user-generated data imports in MySQL/PHP

0 Answers0