1

I have a large dB table where I need to search and replace certain char etc. Some of these are special char.

First I am trying just find before changing the statement to Update replace type.

Below runs ok for

$Search_for = '%apple%'; 

But fails on Special char

So for this example we will concentrate on the ™ ( as pasted from the field)

$search_what = 'LongDescription';
$Search_for = '%™%';
SearchToSee($conn,$search_what,$Search_for);

and the function

function SearchToSee ($conn,$search_what,$Search_for) {
                $stmt = $conn->prepare(" SELECT *  FROM table_name WHERE $search_what Like  '$Search_for'  ");
                $stmt->execute();

               foreach ( $stmt as $row ) {
                            print_r ($row);
                }

So how do I format the $Search_for = ?

For reason further on and other systems I have to run each find replace char differently and replace with its own different letters.

So Far I have tried:

in PHP myadmin WHERE LongDescription LIKE '%™%' works !!

in the php:

$Search_for = '%apple%';  works but not special char
$Search_for = '%™%';// Not Working
$Search_for = '%_™%';// Not Working
$Search_for = '™';// Not Working
$Search_for = '%™%';// Not Working

Do I need to change the encoding to pass to SQL ?

Now tried:

        $Search_for2 = '™';
        $Search_for3 = mb_convert_encoding($Search_for2, 'UTF-8', 'UTF-8'); 
        echo  $Search_for3;
        $Search_for = '%'.$Search_for3.'%';

Which echo's â„¢ and works for a proper term like "APPLE" but still not special characters.

Fred
  • 33
  • 8
  • yes you do... and it depends on what encoding your database uses.. if UTF8 then you need to convert your string into UTF8.. see: http://php.net/manual/en/function.iconv.php – RaggaMuffin-420 Mar 03 '15 at 12:38
  • Thanks - db shows "LongDescription text utf8_general_ci" and $Search_for2 = '™'; $Search_for3 = mb_convert_encoding($Search_for2, 'UTF-8', 'UTF-8'); echo $Search_for3; and echo shows â„¢ $Search_for = '%'.$Search_for3.'%'; does not find anything - Ideas ? – Fred Mar 03 '15 at 12:58
  • try removing the last parameter from mb_convert_encoding.. as you are saying the input is UTF-8.. when its not, use the default internal encoding, and you might have more success.. – RaggaMuffin-420 Mar 03 '15 at 13:50
  • thanks - changed to $Search_for3 = mb_convert_encoding($Search_for2, 'UTF-8'); echo now shows ™ but still not finding special char – Fred Mar 03 '15 at 13:55
  • maybe do a regex on your string and only keep `spaces, az A-Z 0-9`? – Andrew Mar 04 '15 at 03:27
  • Thanks but I need to change many of these each in different ways in the dB as others link to the dB so regex wont work :( – Fred Mar 04 '15 at 08:37

3 Answers3

1

First off, you are partially correct. The encoding towards Mysql must be correct. But I guess that is not your problem, as PHP sets this for you IIRC. Just do a select '™'; and see if you are getting the correct feedback in PHP. If not, check your encoding settings.

But secondly where I would suspect the problem to exist, is the collation that you use. Depending on the collation special characters are ignored or seen as others when working with strings.

Collations are set client side, so it could be that the default PHP collation is different from the one set by phpMyAdmin, causing the different behaviour you see.

select * from
  (select 'privé' as word) as t
where word like '%e%'

The above will or will not hit, depending on which collations are used.

But the below will certainly not hit:

select * from
  (select 'privé' COLLATE utf8_bin as word) as t
where word like '%e%' COLLATE utf8_bin;

But the next will again certainly hit:

select * from
  (select 'privé' COLLATE utf8_general_ci as word) as t
where word like '%e%' COLLATE utf8_general_ci;

Try checking your collations and character sets from PHP and phpMyAdmin with:

select
  @@collation_server,
  @@collation_connection,
  @@character_set_server,
  @@character_set_client;

A third suggestion that I can give is to check if you indeed are storing and searching for the exact same character. If I do echo ord('™'); in PHP, I get 226. Could it be that your stored value and value that you search with have different ordinal values? I'm not sure this is possible, but maybe you are using different encodings that have the same character with different ordinal values?

nl-x
  • 11,762
  • 7
  • 33
  • 61
  • Hi - PHPmyadmin SELECT HEX( LongDescription ) , LongDescription FROM table WHERE LongDescription LIKE '%™%' ----- shows 209920 as hex.. ( for only the ™ in the field) Investigating the db further the table is in latin1_swedish_ci ( passed to us in this format ) @@collation_server @@collation_connection @@character_set_server @@character_set_client latin1_swedish_ci utf8_general_ci latin1 utf8 – Fred Mar 04 '15 at 08:39
  • @Fred You should do the `select @@collation_server @@collation_connection @@character_set_server @@character_set_client;` from PHP as well as from PhpMyAdmin , and check if they return the same response ! My guess is that your client settings are not the same. Once you know what charset/collation you need, look at this answer for how to set the correct one: http://stackoverflow.com/questions/13101526/mysql-php-character-set-setting/13101585#13101585 – nl-x Mar 04 '15 at 13:50
0

mysqli_set_charset('utf8') -- or related function.

The snippets of strings you showed imply you have "double encoding".

Do SELECT HEX(col), col FROM tbl WHERE ... to see what was stored for ™. If it is stored in correctly as utf8, you should see e284a2, which, when displayed as latin1, show "â„¢". If it is double-encoded then you will get hex 'C3A2E2809EC2A2' or 'â„¢';

Once we determine whether the data is stored correctly, we can focus on what needs fixing in the INSERT versus the SELECT. Possibly it is in the PHP code.

For more discussion of the problem, see http://mysql.rjweb.org/doc.php/charcoll .

Edit...

OK, I see that you have latin1 encoding for space(20),TM(99),space(20).

Plan A: Everything should be working: If the column is CHARACTER SET latin1 and you used set_charset('utf8') in PHP, then things should have "just worked". This is because that combination should have converted the latin1 x99 to/from the utf8 xE284A2. Are assumptions wrong?

Plan B: Switch to latin1 in PHP settings, html meta, etc.

Plan C: Fix the data in the tables (and probably leave the PHP alone). This probably involves an ALTER to CONVERT the tables. Are the column(s) currently defined as CHARACTER SET latin1? (Do SHOW CREATE TABLE.)

Plan D: Start over. (This involves dropping the tables, recreating them, re-populating them, etc. -- This may be practical if you are just now starting with the database.)

(There is no "double encoding".)

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Hi - PHPmyadmin SELECT HEX( LongDescription ) , LongDescription FROM table WHERE LongDescription LIKE '%™%' ----- shows 209920 as hex.. ( for only the ™ in the field) Investigating the db further the table is in latin1_swedish_ci ( passed to us in this format ) @@collation_server @@collation_connection @@character_set_server @@character_set_client latin1_swedish_ci utf8_general_ci latin1 utf8 – Fred Mar 04 '15 at 08:39
  • Hi Rick - Thanks, I have just finally solved it... The db is all over the place and inherited/ updates daily from a source outside of my control so I cant change it. the setting of the encoding in php was not working as the inherited dB connection ( again not mine ARGG ) was PDO added a second $conn2 mysqli for this function , and now the encoding is set able in both directions. – Fred Mar 04 '15 at 20:40
0

Thanks to Rick and Nl-X Turns out that in the

    <?php require_once(' dB connection.... 

it was PDO which would not allow the encoding to be changed.

So solution add a

     $conn2=mysqli_connect($servername,$username,$password,.......

$search_what =  'Table_Name';
$Search_for2 = '™';
$Search_for = '%'.$Search_for2.'%';
$Replace_with = 'TRADE MARK';

SearchToSee($conn2,$search_what,$Search_for,$Replace_with,$Search_for2);

and the function

function SearchToSee ($conn2,$search_what,$Search_for,$Replace_with,$Search_for2) {
mysqli_set_charset($conn2, 'utf8'); // change as required
mysqli_query($conn2, "SET NAMES 'utf8';");// change as required
mysqli_query($conn2, "SET CHARACTER SET 'utf8';");// change as required
mysqli_query($conn2, "SET COLLATION_CONNECTION = 'utf8_unicode_ci';"); // change as required

// below makes it simple to see what your changing

$result = mysqli_query($conn2, "SELECT *  FROM $table WHERE $search_what     Like '$Search_for'");
$result2 = mysqli_query($conn2, "select
  @@collation_server,
  @@collation_connection,
  @@character_set_server,
  @@character_set_client;");

  foreach ($result2 as $grr) {
                echo '<br>';
                print_r ($grr);// shows result of new settings need to match last line
                echo '<br>Array ( [@@collation_server] => latin1_swedish_ci [@@collation_connection] => latin1_swedish_ci [@@character_set_server] => latin1 [@@character_set_client] => latin1 ) '; // original N/W $grr
            echo '<br>latin1_swedish_ci  ----              utf8_general_ci  ----       latin1      ----- utf8 <br><br>'; // from @@ checks in phpmyadmin on table.
            }

// TO Update
mysqli_query($conn2, "UPDATE Table_Name SET $search_what = replace($search_what, '$Search_for2', '$Replace_with') WHERE $search_what Like '$Search_for'");              
}

Hope that helps someone.

Fred
  • 33
  • 8