4

I'm having a problem with hidden non-ASCII characters (spaces) in my database.

How can I replace them with normal spaces and convert them before being inserted to avoid future problems?

I'm still not 100% sure what's happening, but I think it's with the non-ASCII spaces. Any advice to help track it down will help.


Here's what's happening:

I have a database with keywords and if I search for "test keyword", nothing shows up. I know for a fact that "test keyword" is in the database.

If I search for "test" or "keyword", it will show up.

If I do a query with:

SELECT * FROM keywords WHERE keyword regexp '[^ -~]'; (found here)

It will display "test keyword" - giving me the conclusion there is a non-ASCII character with the space in "test keyword".

Community
  • 1
  • 1
Ricky
  • 276
  • 1
  • 7
  • 22
  • ASCII refers to the first 128 characters in most character sets, the latin alphabet and numbers and such. I don't think it's what you mean to be saying. – Dan Grossman Aug 17 '11 at 05:17
  • An ASCII space is a normal space. Perhaps you could give an example of what problem you're experiencing. – thomasrutter Aug 17 '11 at 05:17
  • 5
    That aside, please describe the actual problem you're having, not *what you think you should fix even though you're not sure what's happening*. Bad questions get bad answers. – Dan Grossman Aug 17 '11 at 05:18
  • Do you mean _control_ or _non-printable_ characters? Spaces are considered both visible and printable, not hidden. – Ray Toal Aug 17 '11 at 05:19
  • @Dan Grossman actually ASCII is the first 128 characters – thomasrutter Aug 17 '11 at 05:19
  • Edited - maybe that makes more sense in what's happening. – Ricky Aug 17 '11 at 05:38

4 Answers4

5

This works with PHP:

str_replace("\xA0", ' ', $keyword)

Now i'm trying to replace all existing ones in the database.

I think this should be working, but it's not:

update keywords set keyword = replace(keyword, char(160), " ") WHERE keyword regexp char(160);

Any ideas?

Community
  • 1
  • 1
Ricky
  • 276
  • 1
  • 7
  • 22
3

I had the same issue and was able to create a update query to replace (in my case) non breaking spaces.

First I analyzed the binary values of the strings that had those chars (I used Mysql workbench 'Open value in editor" to do so). I realized that in my case the char(s) that I wanted to replace had a hex value of 'a0'.

Next I went to this page http://www.fileformat.info/info/unicode/char/a0/charset_support.htm and checked all the encodings that interpret a0 as a non breaking space.

Next I built this query

UPDATE keywords SET keyword = TRIM(REPLACE(keyword, CONVERT(char(160) USING hp8), ' '));

, I chose hp8 but utf8 worked as well.

It took me some time to reach this solution...so I hope this helps someone with the same problem, not to lose his mind trying to figure a solution.

bfilipesoares
  • 147
  • 11
1

How about:

update keywords
set keyword = replace(keyword, char(160), ' ') 
WHERE keyword LIKE concat('%',char(160),'%');
ErichBSchulz
  • 15,047
  • 5
  • 57
  • 61
0

Do you want to remove all non alphanumeric characters?

$string = “Here! is some text, and numbers 12345, and symbols !£$%^&”;

$new_string = preg_replace(“/[^a-zA-Z0-9\s]/”, “”, $string);
apscience
  • 7,033
  • 11
  • 55
  • 89
  • No, I'd like to be able to keep all punctuation. My main problem is with ASCII spaces being used. They show as regular spaces in the database but act differently (cant search for or duplicate characters). Hope that makes sense. – Ricky Aug 17 '11 at 05:32