How to check if letter is upper or lower in PHP?

Question

I have texts in UTF-8 with diacritic characters also, and would like to check if first letter of this text is upper case or lower case. How to do this?

@Elizabeth Buckwalter Because I work out other text from this text, and If this first letter is upper than I must do the same with second one. — Tom Smykowski, May 13 '10 at 08:55

score 52 · Answer 1 · edited Jan 08 '22 at 01:56

52

function starts_with_upper($str) {
    $chr = mb_substr ($str, 0, 1, "UTF-8");
    return mb_strtolower($chr, "UTF-8") != $chr;
}

Note that mb_substr is necessary to correctly isolate the first character.

Working Demo Online

edited Jan 08 '22 at 01:56

HoldOffHunger

18,769
10
104
133

answered May 11 '10 at 22:36

Artefacto

96,375
17
202
225

3

Doesn't always work. There are Unicode characters that are capital letters (i.e., category Lu) but don't have a lowercase mapping. Mostly, the mathematical bold/italic/double-struck letters. – dan04 May 13 '10 at 06:25
1

@dan04 That's an excellent point. On top of that, there's title case (LT). However, the mbstring extension does not expose functions to userspace to test for those properties. It's a pity because the functionality is there -- see http://svn.php.net/viewvc/php/php-src/trunk/ext/mbstring/php_unicode.h?revision=296679&view=markup – Artefacto May 13 '10 at 07:17
1

@dan04 How this function will handle this situation? – Tom Smykowski May 13 '10 at 08:57
To clarify, "There are over 100 lowercase letters in the Unicode Standard that have no direct uppercase equivalent." -- http://unicode.org/faq/casemap_charprop.html – mickmackusa May 05 '19 at 15:32

score 28 · Answer 2 · edited Jan 16 '12 at 09:36

28

Use ctype_upper for check upper case:

$a = array("Word", "word", "wOrd");

foreach($a as $w)
{
    if(ctype_upper($w{0}))
    {
        print $w;
    }
}

edited Jan 16 '12 at 09:36

gmadd

1,146
9
18

answered Jun 06 '11 at 22:06

Eugen

1,356
12
15

4

This doesn't work with non-latin characters, such as the nordic ÆØÅ. – qualbeen Mar 12 '14 at 09:46
1

Those are [latin chars](https://en.wikipedia.org/wiki/ISO/IEC_8859-1). `ctype_upper` doesn't work with *non-ASCII* chars (including those nordic latins, as well as many other latin, and especially non-latin chars). – Sz. Oct 29 '18 at 15:12
1

Thank you for both comments! But in the question stands "UTF-8 with diacritic characters" and it works fine. If you need a function for other chars, use answer from Artefacto. – Eugen Nov 01 '18 at 20:26
This answer is incorrect for two reasons because you failed to test multibyte characters as the question clearly states. 1. You cannot grab a multibyte character by the `0` byte offset -- you will only access the first byte of the letter. 2. `ctype_` doesn't provide the necessary multibyte support for this task. – mickmackusa May 05 '19 at 14:22

mickmackusa · Accepted Answer · 2020-01-27T02:11:13.583

It is my opinion that making a preg_ call is the most direct, concise, and reliable call versus the other posted solutions here.

echo preg_match('~^\p{Lu}~u', $string) ? 'upper' : 'lower';

My pattern breakdown:

~      # starting pattern delimiter 
^      #match from the start of the input string
\p{Lu} #match exactly one uppercase letter (unicode safe)
~      #ending pattern delimiter 
u      #enable unicode matching

Please take notice when ctype_ and < 'a' fail with this battery of tests.

Code: (Demo)

$tests = ['âa', 'Bbbbb', 'Éé', 'iou', 'Δδ'];

foreach ($tests as $test) {
    echo "\n{$test}:";
    echo "\n\tPREG:  " , preg_match('~^\p{Lu}~u', $test)      ? 'upper' : 'lower';
    echo "\n\tCTYPE: " , ctype_upper(mb_substr($test, 0, 1))  ? 'upper' : 'lower';
    echo "\n\t< a:   " , mb_substr($test, 0, 1) < 'a'         ? 'upper' : 'lower';

    $chr = mb_substr ($test, 0, 1, "UTF-8");
    echo "\n\tMB:    " , mb_strtoupper($chr, "UTF-8") == $chr ? 'upper' : 'lower';
}

Output:

âa:
    PREG:  lower
    CTYPE: lower
    < a:   lower
    MB:    lower
Bbbbb:
    PREG:  upper
    CTYPE: upper
    < a:   upper
    MB:    upper
Éé:               <-- trouble
    PREG:  upper
    CTYPE: lower  <-- uh oh
    < a:   lower  <-- uh oh
    MB:    upper
iou:
    PREG:  lower
    CTYPE: lower
    < a:   lower
    MB:    lower
Δδ:               <-- extended beyond question scope
    PREG:  upper  <-- still holding up
    CTYPE: lower
    < a:   lower
    MB:    upper  <-- still holding up

If anyone needs to differentiate between uppercase letters, lowercase letters, and non-letters see this post.

It may be extending the scope of this question too far, but if your input characters are especially squirrelly (they might not exist in a category that Lu can handle), you may want to check if the first character has case variants:

\p{L&} or \p{Cased_Letter}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).

Source: https://www.regular-expressions.info/unicode.html

To include Roman Numerals ("Number Letters") with SMALL variants, you can add that extra range to the pattern if necessary.

https://www.fileformat.info/info/unicode/category/Nl/list.htm

Code: (Demo)

echo preg_match('~^[\p{Lu}\x{2160}-\x{216F}]~u', $test) ? 'upper' : 'not upper';

Special thanks to [@Wiktor](https://stackoverflow.com/users/3832970/wiktor-stribiżew) for helping me to find these characters at fileformat.info. — mickmackusa, Jan 27 '20 at 02:13

score 11 · Answer 4 · answered May 11 '10 at 22:32

11

Tried ?

$str = 'the text to test';
if($str{0} === strtoupper($str{0})) {
   echo 'yepp, its uppercase';
}
else{
   echo 'nope, its not upper case';
}

answered May 11 '10 at 22:32

Vidar Vestnes

42,644
28
86
100

$str{0} is the same as $str[0]. Sometimes substr(string, start, length) is useful with start or length being negative. – karmakaze Sep 13 '12 at 06:43
This answer is incorrect for two reasons because you failed to test multibyte characters as the question clearly states. 1. You cannot grab a multibyte character by the `0` byte offset -- you will only access the first byte of the letter. 2. `strtoupper` doesn't provide the necessary multibyte support for this task. – mickmackusa May 05 '19 at 14:23

score 5 · Answer 5 · answered May 14 '13 at 09:59

5

As used in Kohana 2 autoloader function:

echo $char < 'a' ? 'uppercase' : 'lowercase';

When a string character is cast to integer it evaluates to its ASCII number. As you know in the ASCII table first there are some control characters and others. Then the uppercase letters from the Latin alphabet. And then the lowercase letters from the Latin alphabet. Thus you can easily check whether the code of a letter is smaller or bigger than the small latin character a.

BTW this is around twice as fast than a solution with regular expressions.

answered May 14 '13 at 09:59

Haralan Dobrev

7,617
2
48
66

this is the fastest even in utf – Shady Keshk Jan 05 '17 at 00:01
This answer is incorrect because you failed to test multibyte characters as the question clearly states. – mickmackusa May 05 '19 at 14:25

Déjà vu · Answer 6 · 2014-03-26T15:08:56.223

5

Note that PHP provides the ctype family like ctype_upper.

You have to set the locale correctly via setLocale() first to get it to work with UTF-8.
See the comment on ctype_alpha for instance.

Usage:

if ( ctype_upper( $str[0] )) {
    // deal with 1st char of $str is uppercase
}

edited Mar 26 '14 at 15:08

answered Nov 22 '10 at 09:26

Déjà vu

28,223
6
72
100

Doesn't work on UTF-8. That comment on php.net has -2 (down) votes. Try: `setlocale(LC_ALL, 'ru_RU.utf-8'); return ctype_upper('П') === false;` – DUzun Oct 24 '18 at 10:25
Getting the `setLocale()` setting to be correct in a dynamic envoronment can be a hassle. More importantly you cannot access a whole multibyte character by the first byte offset. This answer is incorrect/unstable. https://3v4l.org/38R6f – mickmackusa May 05 '19 at 14:39

score 4 · Answer 7 · edited Jan 16 '12 at 09:36

4

I didn't want numbers and others to be an upper char, so I use:

if(preg_match('/[A-Z]$/',$char)==true)
{
   // this must be an upper char
   echo $char
}

edited Jan 16 '12 at 09:36

gmadd

1,146
9
18

answered Jan 31 '11 at 21:20

Dimmen

49
1
1

This answer is incorrect because: 1. It is not checking the first character, it is checking the last character. 2. It is not attempting to match multibyte characters as the question clearly states. – mickmackusa May 05 '19 at 14:27

score 2 · Answer 8 · answered Jul 21 '11 at 03:06

2

What about just:

if (ucfirst($string) == $string) {dosomething();}

answered Jul 21 '11 at 03:06

Tony

41
1

No. This is inappropriate/incorrect for the question asked. https://3v4l.org/1GpYX – mickmackusa May 05 '19 at 14:43

score 2 · Answer 9 · answered Jul 29 '11 at 20:27

2

If you want it in a nice function, I've used this:

function _is_upper ($in_string)
{
    return($in_string === strtoupper($in_string) ? true : false);
}

Then just call..

if (_is_upper($mystring))
{
  // Do....
}

answered Jul 29 '11 at 20:27

Kver

767
5
19

Your solution is inappropriate/incorrect for the question asked. Your solution does not provide support for "diacritic characters" as clearly stated in the question. – mickmackusa May 05 '19 at 14:47

score 1 · Answer 10 · edited Dec 28 '20 at 22:00

1

Another possible solution in PHP 7 is using IntlChar

IntlChar provides access to a number of utility methods that can be used to access information about Unicode characters.

$tests = ['âa', 'Bbbbb', 'Éé', 'iou', 'Δδ'];

foreach ($tests as $test) {
    echo "{$test}:\t";
    echo IntlChar::isUUppercase(mb_substr($test, 0, 1)) ? 'upper' : 'lower';
    echo PHP_EOL; 
}

Output:

âa:     lower
Bbbbb:  upper
Éé:     upper
iou:    lower
Δδ:     upper

While @mickmackusa's first pattern (~^\p{Lu}~u) is good, it will give the wrong result for different general category values (other than "Lu" uppercase letter category). *Note, he has since extended the pattern at the bottom of his answer to include Roman Numerals.

For example

Ⅷ => ⅷ
Ⅼ => ⅼ
Ⅿ => ⅿ
Ⅾ => ⅾ
Ⅽ => ⅽ

 var_dump(preg_match('~^\p{Lu}~u', 'Ⅷ') ? 'upper' : 'lower'); // Resutl: lower
 var_dump(preg_match('~^\p{Lu}~u', 'ⅷ') ? 'upper' : 'lower'); // Result: lower

But

var_dump(IntlChar::isUUppercase(mb_substr('Ⅷ', 0, 1)) ? 'upper' : 'lower'); // Result: upper    
var_dump(IntlChar::isUUppercase(mb_substr('ⅷ', 0, 1)) ? 'upper' : 'lower'); // Result: lower

Make sure to use IntlChar::isUUppercase but not IntlChar::isupper if you want to check for characters that are also uppercase but have a different general category value

Note: This library depends on intl (Internationalization extension)

edited Dec 28 '20 at 22:00

mickmackusa

43,625
12
83
136

answered Jan 22 '20 at 22:15

Rain

3,416
3
24
40

@mickmackusa True, I like your approach and i think it's slightly faster. But I wouldn't use regex for such a simple task. – Rain Jan 22 '20 at 22:22
You'd rather use a `mb_` function then a library-dependent class method? Okay, your choice. I always favor regex when it provides the most direct approach AND sensible performance isn't lost. It is good to provide choices to researchers. – mickmackusa Jan 22 '20 at 22:24
@mickmackusa Yes if that increases my code readability. Again your solution is great but without your pattern breakdown it may take a little more time for someone stupid like me to get it. – Rain Jan 22 '20 at 22:36
I'll investigate your comparison after work. Thanks for pinging me. Are you saying that my answer fails on Roman Numerals? – mickmackusa Jan 23 '20 at 00:29
@mickmackusa Yes, some of the Roman Numerals and also some of the (So) category have case variants. And i think it would be useful to take care of those cases too. – Rain Jan 23 '20 at 03:07
Definitely upvote-worthy. Thanks for sharing the difference. (I never found time to investigate a pattern change that would cover those fringe cases.) – mickmackusa Jan 26 '20 at 10:51

score 0 · Answer 11 · answered Jan 19 '13 at 11:22

0

if(ctype_upper(&value)){
    echo 'uppercase';
}
else {
    echo 'not upper case';
}

answered Jan 19 '13 at 11:22

Sumith Harshan

6,325
2
36
35

`ctype_` doesn't provide the necessary multibyte support for this task. The OP is very clear about needing to process "diacritic characters". This code-only answer is incorrect/inappropriate. – mickmackusa May 05 '19 at 14:45

How to check if letter is upper or lower in PHP?

11 Answers11

Linked

Related