1

I just wanted to share my experience when needing to deal with an language independent version of ucfirst. the problem is when you are mixing English texts with Japanese, chinese or other languages as in my case sometimes Swedish etc. with ÅÄÖ, traditional ucfirst has issues with converting the string to capitalized.

I did however sometime ago stumbled across the following code snippet here on stack overflow:

function myucfirst($str) {
    $fc = mb_strtoupper(mb_substr($str, 0, 1));
    return $fc.mb_substr($str, 1);
}

It works fine in most cases but recently I also needed the translations autogenerate texts in dynamic pdfs using TCPDF.

This is when I hit my head over why TCPDF had issues with the text. I had no problems anywhere else, the character encoding was utf8 but still it bricked.

When showing Kanji for Japanese signs, I just put ignore using the above function to captitalize the word but all of a sudden when using Swedish, I encountered the same brick when I need to capitalize ÅÄÖ.

That led me to realize that the problem with the function above is that it's only looking at the first character. ÅÄÖ is taking up 2 letter spaces and kanjis for chinese or Japanese letters take up 3 letter spaces and the function above did not consider that resulting to bricking TCPDF.

To give more context, When generating PDF documents with TCPDF the TCP font will end up getting errors since the gerneal mb_string function will translate the first character to "?�"vrigt for the swedish word Övrigt and with for instance Japanese "?��"のととろ, for 隣のトトロ (my neighbour totoro.) this will make the font translation for the � not work correctly. you need to do the conversion of ÅÄÖ for the first two letters substr($str, 0,2) to be able to convert the letter properly.

Also I am not sure if you see the code examples I gave but since neither chinese or japanese use upper case letters in their writing language, I am excluding every sign that requires 3 letter spaces since they are not managing upper / lower cases at all. I don't really want to exclude them but parsing them through mb_string will lead to similar errors in TCPDF so, my examples are a workaround for now or if someone has a better solution.

so... my approach was to solve the above problem by using the following function.

function myucfirst($str) {
    if ($str[0] !== "?"){
        for($i = 1; $i <= 3; $i++){
            $first = substr($str, 0, $i);
            $first = mb_convert_case($first, MB_CASE_UPPER, "UTF-8");
            if ($first !== '?'){                
                $rest = substr($str, $i);
                break;
            }
        }
        if ($i < 3){
            $ret_string = $first . $rest;
        } else {
            $ret_string = $str;
        }
    } else {
        $ret_string = $str;
    }   
    return $ret_string;
}

Thanks to Steven Pennys' help below, this is the solution that's working both with Swedish and Japanese / chinese special characters, even when needing to use a string with the library TCPDF for dynamically creating PDFs:

function myucfirst($str) {
    $ret_string = mb_convert_case($str, MB_CASE_TITLE, 'UTF-8');
    return $ret_string;
}

and following to do a similar fix for ucwords

function myucwords($str){
    $str = trim($str);
    if (strpos($str, ' ') !== false){
        $str_arr = explode(' ', $str);
        foreach ($str_arr as $word){
            $ret_str .= isset($ret_str)? ' ' . myucfirst($word):myucfirst($word);
        }
    } else {
        $ret_str = myucfirst($str);
    }
    return $ret_str;
}

The myucwords is using the first myucfirst to capitalize each word.

Since I am not that experienced as a developer or a stack overflow contributor, you should be able to see 3 code examples and I would really appreciate if there's better ways to write these functions but for now, for those who have the similar problem, please enjoy!

/Chris

Chris Lex
  • 25
  • 6
  • I think that on myucwords($str), initialization of $ret_str is missing («$ret_str=''» before «foreach ($str_arr as $word){») – Alberto Suárez Oct 25 '22 at 13:21

1 Answers1

2

The examples you gave are poor, as with Övrigt the input is exactly the same as the output. So I modified the example so they can be useful. See below:

<?php
# example 1
$s1 = mb_convert_case('åäö', MB_CASE_TITLE);
# example 2
$s2 = mb_convert_case('övrigt', MB_CASE_TITLE);
# exmaple 3
$s3 = mb_convert_case('隣のトトロ', MB_CASE_TITLE);
# print
var_dump($s1 == 'Åäö', $s2 == 'Övrigt', $s3 == '隣のトトロ');

Note you will need this in your php.ini, if its not already:

extension = mbstring

https://php.net/function.mb-convert-case

Zombo
  • 1
  • 62
  • 391
  • 407
  • Hi Steven and thank you for the reply, I have added more context to the problems I experienced. On my end I do see 3 code snippets in the original texts, if that's what you referring to as examples. Since I am not used to stack exchange I am not sure that I posted it correctly, and if you have any ideas of why they're not shown, please let me know – Chris Lex Nov 27 '20 at 02:09
  • so... I tried using your solution and TCPDF. With the Swedish word Övrigt you get the following result: without UTFF-8 flag , ??Vrigt - wrong with the UTF-8 flag, Övrigt - correct Since I need to have a fuction that considers the case of Japanese signs being parsed to this function, that needs to be working as well and the only result I get from parrsing 隣のトトロ is ????? Remember that when I am displaying the words native in a browser, your solution works 100% of the time, it's when you mix in TCPDF you will get issues – Chris Lex Nov 27 '20 at 02:54
  • forget what I said... I seemed to have forgotten to add the correct font for Japanese... Thank you, the solution works well with Japanese signs now as well – Chris Lex Nov 27 '20 at 03:03
  • TCPDF is a library to dynamically create pdf files using php. like for invoices, estimates etc. I've verified that your solution works but you would want to specifay UTF8 by adding the flag ike this "$s1 = mb_convert_case('åäö', MB_CASE_TITLE, 'UTF-8');" to get the correct result using that library – Chris Lex Nov 27 '20 at 03:08