How to cut off multi-byte string (English word and Chinese character) in PHP?

Question

I have this function works not quite well in PHP 5.2.0, this function cut string into desired length :

function neat_trim($str, $n, $delim='...')
{
    $len = strlen($str);

    if ($len > $n)
    {
        preg_match('/(.{' . $n . '}.*?)\b/', $str, $matches);
        return rtrim($matches[1]) . $delim;
    }
    return $str;
}

And I call

$multibyte_string = "Portion of Chicken for 1 person<br>一人份鸡肉";

echo neat_trim($multibyte_string,42) . "</br>";

Will produce

Portion of Chicken for 1 person
一人�...

Unfortunately it won't work on PHP-5.4.29, it will produce:

...

I've tried this and this but didn't work. Please help.

If this is *utf-8* 1.) `$len = strlen($str);` use `mb_strlen($str, "utf-8");` the char length is **40** not **50** [mbstring extension](http://php.net/manual/en/book.mbstring.php) needed. 2.) If it's unicode use `u` [flag](http://php.net/manual/en/reference.pcre.pattern.modifiers.php) and probably also `s` flag is wanted in your regex for making the dot also match newlines: `'/(.{' . $n . '}.*?)\b/us'` — Jonny 5, Aug 31 '15 at 11:40
Thanks @Jonny, your comment is really help me. I'm new in handling multi character in PHP. I posted my working code. — Seto, Sep 01 '15 at 07:04
I don't think "一人�..." is a particularly positive outcome. I wouldn't call this "it works". — deceze, Sep 01 '15 at 07:26
possible duplicate of [Multibyte trim in PHP?](http://stackoverflow.com/questions/10066647/multibyte-trim-in-php) — greut, Sep 01 '15 at 08:27
@Seto would replace tags from `$multibyte_string` with space before processing. The word boundary `\b` can and will break html tags. [See here with few modifications](http://pastebin.com/kmMiHF5g). — Jonny 5, Sep 01 '15 at 17:10

score 1 · Accepted Answer · answered Sep 01 '15 at 07:12

Working code based on @Jonny's comment, thanks again

function neat_trim($str, $n, $delim='...')
{
    $len = mb_detect_encoding($str) == "UTF-8" ? mb_strlen($str, "UTF-8") : strlen($str);
    if ($len > $n)
    {
        preg_match('/(.{' . $n . '}.*?)\b/us', $str, $matches);
        return rtrim($matches[1]) . $delim;
    }
    return $str;
}

How to cut off multi-byte string (English word and Chinese character) in PHP?

1 Answers1