7

I have a query for remove all special characters.
But ONE space resists to that query at the end of email string.

Example : 'test@hotmail.com '

UPDATE my_table SET email= REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(LTRIM(RTRIM(email))),\'\x0B\',\'\'),\'\0\',\'\'),\'\t\',\'\'),\'\r\',\'\'),\'\n\',\'\'),\'\r\n\',\'\'),\'\n\r\',\'\'),\' \',\'\'),CHAR(160),\'\') WHERE id=X

Why?

I use this statement because I have a WHERE id IN(), so I don't want to process special characters in PHP. I want to UPDATE every emails directly with SET and replace, trim() function.

However, some whitespace is not deleted and I don't know why.

My table has approximately 12 millions of rows. I have programmed a CRON which fetch them to delete all specials characters (unfortunately because in the past we don't had check them on INSERT).

So I have build this query to process my 12 MM rows. It works very great except the right whitespace (sometimes it is removed sometimes not). And I want to add that on Workbench, the query works 100% all the time. It does not make sense.

Here is my query again without backslash and with my where IN:

UPDATE NEWSLETTER_SUBSCRIPTION SET email= REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(LTRIM(RTRIM(email))),'\x0B',''),'\0',''),'\t',''),'\r',''),'\n',''),'\r\n',''),'\n\r',''),' ',''),CHAR(160),'') WHERE id IN (' . implode(',', $idEmailToBeProcess) . ')

$idEmailToBeProcess contains around 500 ids.

I think the right whitespace it's a non-breaking space, but my last test with CHAR(160) in my query didn't work.

FOX 9000
  • 123
  • 1
  • 1
  • 6
coolfarmer
  • 505
  • 6
  • 16
  • Why not just use a function that's specifically built for this task in whatever language you're creating the query in? IE. in PHP there is [trim](http://www.php.net/manual/en/function.trim.php). – tenub Jan 07 '14 at 18:55
  • What's your parent language? PHP? Not sure why you'd go through the effort to make that ugly nested SQL statement when you could use something like `preg_replace` in php to do it in one go. – brandonscript Jan 07 '14 at 18:55
  • @tenub There are three trims in the middle of the replace nest though. – Jerry Jan 07 '14 at 19:17
  • I just edit my post to added some information and some detail about my workflow. Thanks guys for helping me. ^^ – coolfarmer Jan 07 '14 at 19:44
  • @coolfarmer Is it maybe an [unicode nbsp](http://www.fileformat.info/info/unicode/char/a0/index.htm)? Check if this returns 1 or 0: `SELECT * FROM tab WHERE id = test_id AND hex(col_name) LIKE '%C2A0%'`, Retry the query with E28083, E28082, E280AF, EFBBBF. These are different kinds of unicode nbsp, en space, em space,... If one is positive, you got it. – Jonny 5 Jan 07 '14 at 20:39
  • You must remove char 194 before you can remove char 160. `REPLACE(REPLACE(value, CHAR(194), ''), CHAR(160), '')` Only Zalgo knows why. – Heath Dutton Aug 26 '19 at 19:42

3 Answers3

2

how about whitelisting? ie allow only valid characters

regex_replace [^-_.@a-zA-Z] with ''

mzzzzb
  • 1,422
  • 19
  • 38
2

Ok, finally I had found the problem !!!

Encoding of PDO is the problem...

Just adjusted driver options and all works good!

PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES \'UTF8\'')

Thanks guys anyway!

coolfarmer
  • 505
  • 6
  • 16
1

Found that on MySQL checking against CHAR(160) does not work.For UTF-8 NBSP the below worked for me:

REPLACE(the_field, UNHEX('C2A0'), ' '). 

Solution provided on similar stack overflow question Whitespace in a database field is not removed by trim()

GotaloveCode
  • 994
  • 1
  • 13
  • 30