2

I have some weird problem with MySQL. I am trying to match two strings, but in one string there is extra characters.

Initial string looks like this 'Ascot ' When I select:

select ascii(substring(name, 1, 1)), ascii(substring(name, 7, 1))

I get 194, 194. But when I replace:

select replace(name, char(194), '' )

it shows as '?Ascot?' in phpMyAdmin and no matching is done. Can someone please help me with this?

Problem column is defined as utf8mb4_unicode_ci. I am trying to match this with column from another table defined as utf8_general_ci. Tried to change utf8_general_ci to utf8mb4_unicode_ci but with no results.

When I do substring(name, 2, 5) then it matches. So the solution should be to replace those characters.

EDIT:

I tried the following function to remove non alphanumeric characters and it seems to work now:

BEGIN
  DECLARE i INT DEFAULT 1;
  DECLARE v_char VARCHAR(1);
  DECLARE v_parseStr VARCHAR(255) DEFAULT ' ';

WHILE (i <= LENGTH(prm_strInput) )  DO 

  SET v_char = SUBSTR(prm_strInput,i,1);
  IF v_char REGEXP  '^[A-Za-z0-9 ]+$' THEN  #alphanumeric

        SET v_parseStr = CONCAT(v_parseStr,v_char);  

  END IF;
  SET i = i + 1;
END WHILE; 
RETURN trim(v_parseStr);
END

But this is extremely inefficient...

Isaac Bennetch
  • 11,830
  • 2
  • 32
  • 43
Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
  • Check this [answer](http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci). This might be of some help. –  Feb 23 '16 at 07:28
  • Do you use a single byte character set or a multi byte one (e.g utf8)? – Shadow Feb 23 '16 at 07:29
  • @Shadow, how can I see that? I updated my question with some collation used if you didn't see that. – Giorgi Nakeuri Feb 23 '16 at 07:32
  • 2
    Both utf8 and utf8mb4 are multi byte character sets. ascii() function basically tell you the value of the leftmost byte within the string of bytes that make up the text. The leftmost byte in a multi byte encoding does not necessarily equal to the leftmost character. You should determine what character is on the either side of the string and you can use trim() to cut them off. – Shadow Feb 23 '16 at 11:26

1 Answers1

-2

Use this before your mysql query:

mysql_query("SET NAMES 'utf8'");
Neeraj Kumar
  • 506
  • 1
  • 8
  • 19