Try to use this pattern:
(^([\s\S]*?)(?=<div>))|(((?<=<\/div>))([\s\S]*?)(?=<div>))|((?<=<\/div>)[\s\S]*)
How it works
^
Matches the beginning of the string
\s
Matches any whitespace character (spaces, tabs, line breaks)
\S
Matches any character that is not a whitespace character (spaces, tabs, line breaks)
*
Match anything, ?
non-greedily (match the minimum number of characters required)
|
Using to combine between one or more pattern
()
Expression will match as a group
(?=<div>)
It is a group construct, that requires the escaped <div>
, before any match can be made.
Why need ?
here?
Match me1 <div><div>Hello World!</div> Match me 2 <div>Hello World!</div> Match me 3.
by default, regexes are greedy, meaning it will match as much as possible. Therefore if you use the above pattern it will select all the text till third <div>
but by adding the non-greedy quantifier ?
makes the regex only select all the text till the first <div>