This is surely not the best solution, but one approach would be to match a string of ASCII characters via [\x00-\x7F]+
followed by a non-ASCII sequence (same pattern negated with ^
). It does not target Chinese specifically, but that is tricky owing to the varied ranges of Chinese Unicode characters.
$string = 'Hello World 自立合作社';
// Capture ASCII sequence into $1 and non-ASCII into $2
echo preg_replace('/([\x00-\x7F]+)([^\x00-\x7F]+)/', '$1<br/>$2', $string);
// Prints:
// Hello World
// 自立合作社
http://codepad.viper-7.com/1kqpOx
Actually, here's an improved version that does specifically target Chinese characters via \p{Han}
. The $2
capture also includes \s
for whitespace.
// This matches any non-Chinese in $1 followed by Chinese (and whitespace) in $2
echo preg_replace('/([^\p{Han}]+)([\p{Han}\s]+)/', '$1<br/>$2', $string);