4

I want to split text message into array at every Space. It's been working just fine until I received this text message. Here is the few code lines that process the text string:

    $str = 'T bw4  05/09/19 07:51 am BW6N 499.803';
    $cleanStr = iconv("UTF-8", "ISO-8859-1", $str);
    $strArr = preg_split('/[\s\t]/', $cleanStr);
    var_dump($strArr);

Var_dump yields this result:

array:6 [▼
 0 => "T"
 1 => b"bw4  05/09/19"
 2 => "07:51"
 3 => "am"
 4 => "BW6N"
 5 => "499.803"
]

The #1 item in the array "1 => b"bw4 05/09/19"" in not correct, I am not able figure out what is the letter "b" in front of the array value. Also, the space(es) between "bw4" and "05/09/19" Any suggestion on how better achieve the string splitting are greatly appreciated. Here is the original string: https://3v4l.org/2L35M and here is the image of result from my localhost: http://prntscr.com/jjbvny

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Guntar
  • 473
  • 8
  • 23
  • Not seeing it ~ https://3v4l.org/TnmK5. You sure you're representing the string correctly here? – Phil May 18 '18 at 01:27
  • It is tricky! When I posted the question, some of the characters probably got filtered out. Here is the original string: https://3v4l.org/2L35M and here is the image of result from my localhost: http://prntscr.com/jjbvny – Guntar May 18 '18 at 01:51
  • 2
    Possible duplicate of [What does the b in front of string literals do?](https://stackoverflow.com/questions/4749442/what-does-the-b-in-front-of-string-literals-do) – Daniel A. White May 18 '18 at 02:03
  • "`array:6 [▼`" is not *standard `var_dump`*! Are you using some framework or PHP extension that provides some prettified dump? – deceze May 18 '18 at 07:12
  • @Daniel The thing is, that was supposed to be a forward compatibility annotation with PHP 6, which never happened. To this day there's no difference between binary strings and non-binary strings. And it's unclear what is *outputting* this exactly. – deceze May 18 '18 at 07:14

3 Answers3

6

To match any 1 or more Unicode whitespace chars you may use

'~\s+~u'

Your '/[\s\t]/' pattern only matches a single whitespace char (\s) or a tab (\t) (which is of course redundant as \s already matches tabs, too), but since the u modifier is missing, the \s cannot match the \u00A0 chars (hard spaces) you have after bw4.

So, use

$str = 'T bw4  05/09/19 07:51 am BW6N 499.803';
$strArr = preg_split('/\s+/u', $str);
print_r($strArr);

See the PHP demo yielding

Array
(
    [0] => T
    [1] => bw4
    [2] => 05/09/19
    [3] => 07:51
    [4] => am
    [5] => BW6N
    [6] => 499.803
)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

I guess your input is not properly encoded. Try:

$cleanStr = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', utf8_encode($str));

This cleans the string for me: https://3v4l.org/d80QS (if it's displayed correctly this time).

Note: This could also mean the encoding gets damaged on the way from your database (is text stored in UTF-8?), your web server (is in Apache's httpd.conf file AddDefaultCharset UTF-8 set?), or in PHP (what's your default_charset in your PHP.ini file? ="utf-8"?), the Website (<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />), or a BOM (byte-order-mark) at the beginning of your source file?

wp78de
  • 18,207
  • 7
  • 43
  • 71
0

Since you're mentioning that the values are not being separated properly First, try to trim your string at both ends

Next, replace the multiple spaces in your string with a single space

$output = preg_replace('!\s+!', ' ', trim($str," "));

After that you can explode based on space

$fout = explode(" ",$output);

Then you can print it.

As for the b prefix, the link that @Daniel A. White has posted is the relevant answer

Rabih Melko
  • 68
  • 10