1

I am using the following code on the following meta tag, and it is resulting in an Incorrect string value in MySQL

PHP preg_replace code

 $content = $link->getAttribute('content');
 $content = preg_replace('/[^a-zA-Z\d\s\.]/', '', $content);

Contents of $content

 FGC ferrocarrils de la generalitat de catalunya tren ferrocarril funicular cremallera aeri telefric teleferic bitllets fgc bitllets integrats bitllets turstics bitllets turistics turisme i muntanya tarifes horaris abonaments pl�nol xarxa planol xarxa lnia barcelonavalls linia barcelonavalles lnia LlobregatAnoia linia LlobregatAnoia  Metro del Valls Metro del Valles Metro del Baix Llobregat lnies urbanes bcn linies urbanes bcn lnies suburbanes bcn linies suburbanes bcn barcelonasabadell barcelonaterrassa barcelonaigualada barcelonamanresa barcelonamartorell montserrat nria nuria vall de nria vall de nuria la Molina lleidala pobla gelida olesaesparraguera tren del ciment tren de lensenyament tren de vapor estacions accessibles l6 l7 s1 s2 s5 s55 l8 s33 s4 s8 r5 r6 transporttransport escolar projectes educatius tren de lensenyament programes pedaggics activitats educatives activitats pedaggiques histria fgc trens vapor cremallera funicular locomotora vagons estacions transports mobilitat pblic Rub Martorell Montserrat Vallvidrera Catalunya Turisme Via neu esqu snowboard esports dhivern bicicleta Mountain Bike accessibilitat Transport sostenible PMR Mobilitat reduda

Meta Tag

 <meta name="keywords" content="FGC, ferrocarrils de la generalitat de catalunya, tren, ferrocarril, funicular, cremallera, aeri, telefèric, teleferic, bitllets fgc, bitllets integrats, bitllets turístics, bitllets turistics, turisme i muntanya, tarifes, horaris, abonaments, plànol xarxa, planol xarxa, línia barcelona-vallès, linia barcelona-valles, línia Llobregat-Anoia, linia Llobregat-Anoia,  Metro del Vallès, Metro del Valles, Metro del Baix Llobregat, línies urbanes bcn, linies urbanes bcn, línies suburbanes bcn, linies suburbanes bcn, barcelona-sabadell, barcelona-terrassa, barcelona-igualada, barcelona-manresa, barcelona-martorell, montserrat, núria, nuria, vall de núria, vall de nuria, la Molina, lleida-la pobla, gelida, olesa-esparraguera, tren del ciment, tren de l'ensenyament, tren de vapor, estacions accessibles, l6, l7, s1, s2, s5, s55, l8, s33, s4, s8, r5, r6, transport,transport escolar, projectes educatius, tren de l'ensenyament, programes pedagògics, activitats educatives, activitats pedagògiques, història fgc, trens, vapor, cremallera, funicular, locomotora, vagons, estacions, transports, mobilitat, públic, Rubí, Martorell, Montserrat, Vallvidrera, Catalunya, Turisme, Via, neu, esquí, snowboard, esports d'hivern, bicicleta, Mountain Bike, accessibilitat, Transport sostenible, PMR, Mobilitat reduïda"/>

MySQL Error

General error: 1366 Incorrect string value: '\xA0nol x...' for column 'meta_content' at row 1

I am using CakePHP 3.4 to Insert the data, using the following lines...

$metaEntity = $metaTable->newEntity([
                    'domains_host_id' => $hostId,
                    'meta_name' => '',
                    'meta_property' => $property,
                    'meta_content' => $content,
                ]);
$metaTable->save($metaEntity);

And here is my database details...

 +------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
 | Name                   | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation       | Checksum | Create_options | Comment |
 +------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
 | domains_software_metas | InnoDB |      10 | Dynamic    | 2677 |            116 |      311296 |               0 |            0 |   2097152 |           2678 | 2017-10-15 10:49:07 | 2017-10-15 11:26:32 | NULL       | utf8_general_ci |     NULL |                |         |
 +------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+

Any ideas on how I can get that preg_replace to replace that nasty character?

Jeffrey L. Roberts
  • 2,844
  • 5
  • 34
  • 69

2 Answers2

1

I'd suggest you just convert the table to UTF8 so you can support all characters.

To answer the question though, based on the output of $content, pl�nol, it looks like the preg_replace is processing the string as single byte characters. To get it to process as unicode add the u modifier.

$content = preg_replace('/[^a-zA-Z\d\s\.]/u', '', $content);

http://php.net/manual/en/reference.pcre.pattern.modifiers.php

chris85
  • 23,846
  • 7
  • 34
  • 51
0

Your issue is certainly in this part of your content string plànol xarxa

Why mySql isn't handling this correctly, I'm not sure. PHP's DOM* classes expect to work with UTF8 encoding, which you may not have based on the '\xA0nol x...' text in the error message.

When loading the HTML, try a utf8_decode() first. This will convert ISO-8859-1 to UTF-8.

$document = new DOMDocument();
$html = getHtml(); // load your html code
$document->loadHTML(utf8_decode($html));
$link = findLink($document); // your code to find the link
$content = $link->getAttribute('content');
$content = preg_replace('/[^a-zA-Z\d\s\.]/', '', $content);
AndrewR
  • 6,668
  • 1
  • 24
  • 38