I see this: PHP preg_match bible scripture format
But my problem is a little different because I want to extract those elements out, not just match them. And my pattern is more complex:
'John 14:16–17, 25–26'
'John 14:16–17'
'John 14:16'
'John 14 16'
'John 14:16'
'John14 : 16'
'John 14 16'
'John14: 16'
'John14:16—17'
'John14 16 17'
'John14 : 16 17'
'John14 : 16 — 17'
'John 14 16 17'
'约翰福音 14 16 17' -> here is an actual example of unicode text
Should also consider '-', ':', and ' ' to be full-width or half-width character, such as '-', ':', and ' ', I mean both should work.
What I want is to extract John(should support unicode), 14, 16 and 17(if exists) those elements.
I've tried:
$str = '10 : 12 — 15 % 52 .633 __+_+)_01(&( %&@#32$%!85#@60$';
preg_match_all('/[\d]+?/isU',$str, $t);
Not work very well.
Then I tried:
preg_match_all("([\u4e00-\u9fa5]+)[^\d\n]*(\d+)[^\d\n]*(\d+)[^\d\n]*(\d*)", "John 14:16", $out);
var_dump($out);
Also not work.
Ok, I found the solution, it works, but I'm not sure if it's 100% correct:
preg_match_all('#([\x{4e00}-\x{9fa5}]+)[^\d\n]*(\d+)[^\d\n]*(\d+)[^\d\n]*(\d*)#u', $keyword, $match);