PHP Regular expression fails when non UTF 8 character found!
I need to strip 40,000 database records to grab a width and height value from a custom_size
mysql table field.
The filed is in all sorts of different random formats.
The most reliable way is to grab a numeric value from the left and right side of an x
and strip all non numeric values from them.
The code below works pretty good 99% of the time until it found a few records with non UTF 8 characters.
31*32
and 35”x21”
are 2 examples.
When these are ran I get these PHP errors and script halts....
Warning: preg_replace(): Compilation failed: this version of PCRE is not compiled with PCRE_UTF8 support at offset 1683977065 on line 21
Warning: preg_match(): Compilation failed: this version of PCRE is not compiled with PCRE_UTF8 support at offset 0 on line 24
Demo:
<?php
$strings = array(
'12x12',
'172.61 cm x 28.46 cm',
'31"x21"',
'1"x1"',
'31*32',
'35”x21”'
);
foreach($strings as $string){
if($string != ''){
$string = str_replace('”','"',$string);
// Strip out all characters except for numbers, letter x, and decimal points
$string = preg_replace( '/([^0-9x\.])/ui', '', strtolower( $string ) );
// Find anything that fits the number X number format
preg_match( '/([0-9]+(\.[0-9]+)?)x([0-9]+(\.[0-9]+)?)/ui', $string, $values );
echo 'Original value: ' .$string.'<br>';
echo 'Width: ' .$values[1].'<br>';
echo 'Height: ' .$values[3].'<br><hr><br>';
}
}
Any ideas around this? I cannot rebuild server software to add support
Just found an answer with a PHP library to convert to UTF8 that seems to be helping a lot https://stackoverflow.com/a/3521396/143030