0

I am using preg_split to split a string into words.

However, it is not working for a particular string that is fetched from a mysql text column.

If I manually assign the string to a variable it will work correctly but not when the string is fetched from the database.

Here is the simple code I am using:

//The failing string. When manually assigned like this it works correctly

$string = "<p><strong>Iden is lesz lehetoseg a foproba és a koncert napjan ebedet kerni a MUPA-ban. Ára 1000-1200 Ft körül várható. Azoknak, akik még nem jártak a MUPA-ban ingyenes bejarasi lehetoseget biztositunk. Tovabba segitunk a pesti szallas megszervezeseben is, ha igenyt tartotok ra.</strong></p>";

$string = strip_tags(trim($string));

$words = preg_split('/\PL+/u', $string, null, PREG_SPLIT_NO_EMPTY);

Here is what the preg_split returns when called on the string from the database:

array(1) { [0]=> string(269) "Iden is lesz lehetoseg a foproba és a koncert napjan ebedet kerni a MUPA-ban. Ára 1000-1200 Ft körül várható. Azoknak, akik még nem jártak a MUPA-ban ingyenes bejarasi lehetoseget biztositunk. Tovabba segitunk a pesti szallas megszervezeseben is, ha igenyt tartotok ra." }

Does anyone know what is causing preg_split to fail for this string?

Thanks

Paul Atkins
  • 355
  • 2
  • 3
  • 19
  • How does it fail, exactly? – mister martin Apr 26 '16 at 20:15
  • @mistermartin - Please see in question for what it returns. It returns back the whole string instead of splitting it – Paul Atkins Apr 26 '16 at 20:18
  • [The code looks working](http://ideone.com/UhokFC). Maybe `-1` is better than `null`? Try `$words = preg_split('/\PL+/u', $string, -1, PREG_SPLIT_NO_EMPTY);` – Wiktor Stribiżew Apr 26 '16 at 20:18
  • @PaulAtkins You are showing us the code that works, not the code that doesn't work. – mister martin Apr 26 '16 at 20:20
  • Your comment says when assign it manually it works. @WiktorStribiżew test pages also works. How do you get that string? – Emil Borconi Apr 26 '16 at 20:20
  • I've had a [similar problem](http://stackoverflow.com/questions/30174237/throw-exception-message-passed-into-constructor-wont-appear-in-php-error/30174949#30174949) with thrown errors. Turns out PHP isn't too fond of hungarian (or, well any UTF-8 text) off-the-shelf. I'd say make sure files are encoded in UTF-8, and that the database is configured accordingly. – John Weisz Apr 26 '16 at 20:20
  • It seems to be a problem of encoding (see [this](http://stackoverflow.com/questions/3076535/problem-getting-text-field-as-string-from-mysql-with-php)). Make sure you get the string in UTF8 encoding from MySQL, and it will work. – Wiktor Stribiżew Apr 26 '16 at 20:32
  • Also, try [this solution](http://stackoverflow.com/a/14068747/3832970) – Wiktor Stribiżew Apr 26 '16 at 20:37
  • Thank you @all. I do believe it is an encoding issue. – Paul Atkins Apr 26 '16 at 20:50
  • Could you let know if/when the links above solved the issue? Thanks. – Wiktor Stribiżew Apr 26 '16 at 21:04

2 Answers2

1

I tested your code with a string from the database and happened the same error, change the regular expresion and you will have the solution. Use this expression:

$words = preg_split('/[\s]/', $string, null, PREG_SPLIT_NO_EMPTY);


//var_dump result

array(42) {
  [0]=>
  string(4) "Iden"
  [1]=>
  string(2) "is"
  [2]=>
  string(4) "lesz"
  [3]=>
  string(9) "lehetoseg"
...
}

UPDATE: The modifier /u are for UTF 8, maybe your database is not in UTF8, and so the expression did not work

0

You don't need a regex for this, explode will do the job:

$string = "<p><strong>Iden is lesz lehetoseg a foproba és a koncert napjan ebedet kerni a MUPA-ban. Ára 1000-1200 Ft körül várható. Azoknak, akik még nem jártak a MUPA-ban ingyenes bejarasi lehetoseget biztositunk. Tovabba segitunk a pesti szallas megszervezeseben is, ha igenyt tartotok ra.</strong></p>";
$string = strip_tags(trim($string));
$words = explode(" ", $string);
print_r($words);

Output:

Array
(
    [0] => Iden
    [1] => is
    [2] => lesz
    [3] => lehetoseg
    [4] => a
    [5] => foproba
    [6] => és
    [7] => a
    [8] => koncert
...

Ideone Demo

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268