1

I have this kind of string :

Blabla1 Blaabla2<br />  Blaabla3 Blaabla4

I'm trying to explode each word where there is a " " or "<br />" with preg_split .

What I exepect :

Blabla1
Blabla2 <br />
Blaabla3
Blaabla4

I tried with this regex (?:(<br\s))|\s but don't manage to exlude "/>"

http://regexr.com/3aqs0

Thanks !

Zagloo
  • 1,297
  • 4
  • 17
  • 34

2 Answers2

3

One way you could do this:

$str = 'Blabla1 Blaabla2<br />  Blaabla3 Blaabla4';
$results = preg_split('~(?:<br[^>]*>\s*\K|\s+)~', $str);
print_r($results);

Output

Array
(
    [0] => Blabla1
    [1] => Blaabla2<br />  
    [2] => Blaabla3
    [3] => Blaabla4
)
hwnd
  • 69,796
  • 4
  • 95
  • 132
  • it works nice thx !! but can you explain all please for my information :) . It interest me ! – Zagloo Apr 15 '15 at 17:31
  • Are you just wanting to split on whitespace not inside HTML? And note this will split the `
    ` if there is a space preceding it as well. I am not clear what exactly you're trying to achieve here.
    – hwnd Apr 15 '15 at 17:36
  • In fact, I'm using a text EDITOR (CKEDITOR). When I save the textarea in DataBase, I need to keep all tags (
    , , , ). Then when I reload the page with my text Editor I need to split each word with his associated tags because my text editor is coupled with a audio player (Jwplayer) which underline each word in function of time avanced...
    – Zagloo Apr 16 '15 at 08:10
  • Other question, If in my string I have two or more consecutive
    ? like `$str = 'Blabla1 Blaabla2

    Blaabla3 Blaabla4';` How to do ?
    – Zagloo Apr 16 '15 at 08:12
  • You could do `(?:(?:
    ]*>\s*)+\K|\s+)`
    – hwnd Apr 16 '15 at 13:20
1

If there is not more HTML, it's okay to use RegEx. Otherwise there are many better ways.

Use <br(\s\/)?>\K|\s:

$matches = preg_split('/<br(\s\/)?>\K|\s/',$string);

This will also work for <br> (which is correct HTML too)

Consider the flag PREG_SPLIT_NO_EMPTY, because there are going to be empty elements using your example string:

preg_split('/<br(\s\/)?>\K|\s/',$string,null,PREG_SPLIT_NO_EMPTY);

Update: To keep the <br />, you need to reset the match using \K. There is a good example on this in the language reference:

\K can be used to reset the match start since PHP 5.2.4. For example, the pattern foo\Kbar matches "foobar", but reports that it has matched "bar". The use of \K does not interfere with the setting of captured substrings. For example, when the pattern (foo)\Kbar matches "foobar", the first substring is still set to "foo".

Community
  • 1
  • 1
Marc
  • 3,683
  • 8
  • 34
  • 48
  • Also he can use PREG_SPLIT_NO_EMPTY for avoiding empty elements, like preg_split('/(?:(
    ))|\s/',$string,null,PREG_SPLIT_NO_EMPTY);
    – engvrdr Apr 15 '15 at 17:18
  • Correct, forgot the flags :) I will add this – Marc Apr 15 '15 at 17:18
  • almost perfect :) ... How to keep the
    after blabla2 ?
    – Zagloo Apr 15 '15 at 17:21
  • @Marc Works Nice ! :) And now, If in my string I have two or more consecutive
    ? like `$str = 'Blabla1 Blaabla2

    Blaabla3 Blaabla4';` How to do ?
    – Zagloo Apr 16 '15 at 07:38
  • find solution here : http://stackoverflow.com/questions/29671967/php-improve-regex-space-and-non-capturing-group/29672376#29672440 . Thanks for all ! – Zagloo Apr 16 '15 at 12:06