The problem with using an A-Z test in a regular expression is that A-Z are not the only uppercase letters.
Consider the city of Überlingen in Germany. The first letter certainly is uppercase, but it is not in the range A to Z. Try it in the JavaScript console:
/^[A-Z]/.test('Überlingen'); // logs false - oops!
Now here is where it gets a bit tricky. What exactly does it mean for a letter to be uppercase? In English it's simple: A-Z vs. a-z. In German, Ü (for example) is uppercase and ü is lowercase. For languages like these that have both uppercase and lowercase characters, you can test if a character is uppercase by converting it to lowercase with the .toLowerCase()
method and comparing that with the original. If they are different, the original was uppercase. If they are the same, the original was either a lowercase character or a character that doesn't have uppercase and lowercase versions (e.g. a number or punctuation mark).
// 'char' is a string containing a single character
function isUpperCase( char ) {
return char !== char.toLowerCase();
}
Now you can test if the first character of a string is uppercase by extracting that character with .charAt()
and calling isUpperCase()
:
function beginsWithUpperCase( string ) {
return isUpperCase( string.charAt(0) );
}
This works correctly for the German city:
beginsWithUpperCase( 'Überlingen' ); // logs `true`.
And now, since we're not using a regular expression at all, if you want to check the string length, merely use the .length
property:
function fourCharactersWithFirstUpperCase( string ) {
return string.length === 4 && beginsWithUpperCase( string );
}
fourCharactersWithFirstUpperCase( 'über' ); // logs false
fourCharactersWithFirstUpperCase( 'Über' ); // logs true
fourCharactersWithFirstUpperCase( 'Überlingen' ); // logs false
So we're in good shape for languages that have both uppercase and lowercase versions of the same character. But what about languages that don't have uppercase vs. lowercase characters? Then this code would return false
for any string.
I don't have a good solution for that off the top of my head; you'd have to think about how you want to handle that case.
BTW if you really want to try this with a regular expression, there's a possible approach in this answer. Instead of just testing for A-Z, you could list all of the uppercase letters in the languages you may have to deal with. Adapting the regex from that answer, it might look like this:
function beginsWithUpperCase( string ) {
return /^[A-ZÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöüçÇߨøÅ寿ÞþÐð]/.test( string );
}
Of course that raises the question of whether we've accurately listed all of the uppercase characters for every language!