0

Here is my function that makes the first character of the first word of a sentence uppercase:

function sentenceCase($str)
{
    $cap = true;
    $ret = '';
    for ($x = 0; $x < strlen($str); $x++) {
        $letter = substr($str, $x, 1);
        if ($letter == "." || $letter == "!" || $letter == "?") {
            $cap = true;
        } elseif ($letter != " " && $cap == true) {
            $letter = strtoupper($letter);
            $cap = false;
        }
        $ret .= $letter;
    }
    return $ret;
}

It converts "sample sentence" into "Sample sentence". The problem is, it doesn't capitalize UTF-8 characters. See this example.

What am I doing wrong?

honk
  • 9,137
  • 11
  • 75
  • 83
heron
  • 3,611
  • 25
  • 80
  • 148

1 Answers1

4

The most straightforward way to make your code UTF-8 aware is to use mbstring functions instead of the plain dumb ones in the three cases where the latter appear:

function sentenceCase($str)
{
    $cap = true;
    $ret = '';
    for ($x = 0; $x < mb_strlen($str); $x++) {      // mb_strlen instead
        $letter = mb_substr($str, $x, 1);           // mb_substr instead
        if ($letter == "." || $letter == "!" || $letter == "?") {
            $cap = true;
        } elseif ($letter != " " && $cap == true) {
            $letter = mb_strtoupper($letter);       // mb_strtoupper instead
            $cap = false;
        }
        $ret .= $letter;
    }
    return $ret;
}

You can then configure mbstring to work with UTF-8 strings and you are ready to go:

mb_internal_encoding('UTF-8');
echo sentenceCase ("üias skdfnsknka");

Bonus solution

Specifically for UTF-8 you can also use a regular expression, which will result in less code:

$str = "üias skdfnsknka";
echo preg_replace_callback(
    '/((?:^|[!.?])\s*)(\p{Ll})/u',
    function($match) { return $match[1].mb_strtoupper($match[2], 'UTF-8'); },
    $str);
Jon
  • 428,835
  • 81
  • 738
  • 806
  • it capitalized every word, not starting one. I need sentence case not, every word – heron Sep 17 '13 at 08:48
  • @heron: Sorry, I didn't read the question properly. I updated the answer. – Jon Sep 17 '13 at 09:02
  • doesn't work for me. result is same as argument passed into function – heron Sep 17 '13 at 09:54
  • @heron: The script contains a string literal. Is it saved as UTF-8? If it is, there is no reason why this would not work. – Jon Sep 17 '13 at 10:00