2

What is an ideal way to detected if a character is uppercase or lowercase, regardless of the fact of the current local language.

Is there a more direct function?

Assumptions: Set internal character encoding to UTF-8 & Local browser session is en-US,en;q=0.5 & Have installed Multibyte String extension. Do not use ctype_lower, or ctype_upper.

See below test code that should be multibyte compatible.

$encodingtype = 'utf8';
$charactervalue = mb_ord($character, $encodingtype);

$characterlowercase = mb_strtolower($character, $encodingtype) ;
$characterlowercasevalue = mb_ord(mb_strtolower($character, $encodingtype));

$characteruppercase = mb_strtoupper($character, $encodingtype);
$characteruppercasevalue = mb_ord(mb_strtoupper($character, $encodingtype));



// Diag Info
echo 'Input: ' . $character . "<br />";
echo 'Input Value: ' . $charactervalue = mb_ord($character, $encodingtype) . "<br />" . "<br />";
echo 'Lowercase: ' . $characterlowercase = mb_strtolower($character, $encodingtype) . "<br />";
echo 'Lowercase Value: ' . $characterlowercasevalue = mb_ord(mb_strtolower($character, $encodingtype)) . "<br />" . "<br />";
echo 'Uppercase: ' . $characteruppercase = mb_strtoupper($character, $encodingtype) . "<br />";
echo 'Uppercase Value: ' . $characteruppercasevalue = mb_ord(mb_strtoupper($character, $encodingtype)) . "<br />" . "<br />";
// Diag Info


if ($charactervalue == $characterlowercasevalue and $charactervalue != $characteruppercasevalue){
    $uppercase = 0;
    $lowercase = 1;
    echo 'Is character is lowercase' . "<br />" . "<br />";
}

elseif ($charactervalue == $characteruppercasevalue and $charactervalue != $characterlowercasevalue ){
    $uppercase = 1;
    $lowercase = 0;
    echo 'Character is uppercase' . "<br />" . "<br />";
}

else{
    $uppercase = 0;
    $lowercase = 0;
    echo 'Character is neither lowercase or uppercase' . "<br />" . "<br />";
}
  • // Test 1 A // Output-> Character is uppercase
  • // Test 2 z // Output-> Character is lowercase
  • // Test 3 + // Output-> Character is lowercase
  • // Test 4 0 // Output-> Character is neither lowercase or uppercase
  • // Test 5 ǻ // LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE // Output-> Character is lowercase
  • // Test 6 Ͱ GREEK CAPITAL LETTER HETA // Output-> Character is uppercase
  • // Test 7 '' NULL // Output-> Character is neither lowercase or uppercase
  • 2
    It seems you're doing it quite well, do you have a particular problem with what you're using now? – KIKO Software May 03 '19 at 17:57
  • 1
    @AdityaThakur That post doesn't deal with local languages. – KIKO Software May 03 '19 at 17:59
  • 4
    If the question is "is this unicode character uppercase", then you could just forego all this code and literally check whether that character's codepoint has the unicode LU property [according to the official Unicode spec](http://www.unicode.org/reports/tr44/#General_Category_Values). But that won't tell you whether something is _considered uppercase_ depending on the locale you're in, because there are a lot of languages on this planet, with a lot of orthographies, and there's a lot of exceptions to almost everything. The best solution is usually "don't use your own code, use a library" – Mike 'Pomax' Kamermans May 03 '19 at 18:00
  • @AdityaThakur This question similar to that question [55570503], but this question is different in scope. https://stackoverflow.com/questions/55570503/how-to-check-if-input-value-begins-with-an-uppercase-or-if-it-has-lowercases-o – RT.01100111 May 03 '19 at 18:07
  • @Mike'Pomax'Kamermans I will look into your suggestion. That might be what I am attempting. – RT.01100111 May 03 '19 at 18:11
  • @RT.01100111 Sorry, my bad. – Aditya Thakur May 03 '19 at 18:14
  • @RT.01100111 i have added and update of my answer... that avoid any problem with Language setting –  May 03 '19 at 20:24

1 Answers1

0

I feel the most direct way would be to write a regex pattern to determine the character type.

In the following snippet, I'll search for uppercase letters (including unicode) in the first capture group, or lowercase letters in the second capture group. If the pattern makes no match, the character is not a letter.

A good reference for unicode letters in regex: https://regular-expressions.mobi/unicode.html

Writing two capture groups separated by a pipe means each type of letter will be slotted into a different indexed element in the output array. [0] is the fullstring match (never used in this case, but its generation is unavoidable). [1] will hold the uppercase match (or be empty when there is a lowercase match -- as a placeholding element). [2] will hold the lowercase match -- it will only be generated if there is a lowercase match.

For this reason, we can assume the highest key in the matches array will determine the casing of the letter.

If the input character is a non-letter, preg_match() will return the falsey result of 0 to represent the number of matches, when this happens 0 is used with the lookup to access neither.

Code: (Demo) (Pattern Demo)

$lookup = ['neither', 'upper', 'lower'];
$tests = ['A', 'z', '+', '0', 'ǻ', 'Ͱ', null];

foreach ($tests as $test) {
    $index = preg_match('~(\p{Lu})|(\p{Ll})~u', $test, $out) ? array_key_last($out) : 0;
    echo "{$test}: {$lookup[$index]}\n";
}

Output:

A: upper
z: lower
+: neither
0: neither
ǻ: lower
Ͱ: upper
: neither

For anyone who is not yet on php7.3, you can call end() then key() like this:

Code: (Demo)

foreach ($tests as $test) {
    if (preg_match('~(\p{Lu})|(\p{Ll})~u', $test, $out)) {
        end($out); // advance pointer to final element
        $index = key($out);
    } else {
        $index = 0;
    }
    echo "{$test}: {$lookup[$index]}\n";
}

My first approach makes a minimum of one function call per test, and a maximum of two calls. My solution can be made into a one-liner by writing the preg_ call inside of $lookup[ and ], but I'm aiming for readability.


p.s. Here is another variation that I dreamed up. The difference is that preg_match() always makes a match because of the final empty "alternative" (empty branch).

foreach ($tests as $test) {
    preg_match('~(\p{Lu})|(\p{Ll})|~u', $test, $out);
    echo "\n{$test}: " , $lookup[sizeof($out) - 1];
}
mickmackusa
  • 43,625
  • 12
  • 83
  • 136