9

I'm about to write a parser for a language that's supposed to have strict syntactic rules about naming of types, variables and such. For example all classes must be PascalCase, and all variables/parameter names and other identifiers must be camelCase.

For example HTMLParser is not allowed and must be named HtmlParser. Any ideas for a regexp that can match something that is PascalCase, but does not have two capital letters in it?

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
Marcin
  • 7,874
  • 7
  • 45
  • 49
  • 2
    I believe that last sentence should be "...but does not have two **consecutive** capital letters in it?" – Chris Lutz Jan 20 '10 at 17:54
  • 4
    Suppose I want to write a C preprocessor in that language. Must I name my class Cpreprocessor? Are underscores (C_Preprocessor) allowed? – Wayne Conrad Jan 20 '10 at 17:55
  • 4
    Would `H` be a valid class name? – Greg Bacon Jan 20 '10 at 17:56
  • @Chris yeah, it should not have 2 consecutive capital letters in it. C_preprocessor is not allowed, it'd have to be PreprocessorForC or something similar. – Marcin Jan 22 '10 at 09:30

7 Answers7

21

camelCase:

^[a-z]+(?:[A-Z][a-z]+)*$

PascalCase:

^[A-Z][a-z]+(?:[A-Z][a-z]+)*$
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • Neither of the above worked for me for some reason. The following did however (?:[a-z]+|[A-Z]+|^)([a-z]|\d)* (remove the |\d if you don't want numbers included. – Emile Oct 11 '20 at 15:58
  • This doesn't capture capitals at the end, e.g. ModeA. It also doesn't allow 2 capital letters in a row (which is generally accepted, e.g. CreateAMode, CreateBMode) – alianos- Apr 08 '22 at 09:45
4

^[A-Z][a-z]*([A-Z][a-z]*)

This should work for :

  1. MadeEasy
  2. WonderFul
  3. AndMe

this types of patters.

3
/([A-Z][a-z]+)*[A-Z][a-z]*/

But I have to say your naming choice stinks, HTMLParser should be allowed and preferred.

  • +1 for a regex and a comment on the naming convention that both look suspiciously similar to what I was going to post, though I would simplify the regex to `/(?:[A-Z][a-z]+)+/` (I don't think the OP is concerned with allowing `AaA` as a class name). – Chris Lutz Jan 20 '10 at 17:54
  • 1
    Yeah, I considered that, but figured AaA doesn't have two consecutive uppercase letters. A bigger problem not yet addressed by this scheme is numbers, do they count as upper, lower, neither, or both? –  Jan 20 '10 at 17:58
  • It's missing some details - like numbers, other than that it seems to work. – Marcin Jan 22 '10 at 09:31
3

I don't believe the items listed can start with numbers (thought I read it somewhere so take it with a grain of salt) so the best case would be something like Roger Pate's with a few minor modifications (in my opinion)

/^([A-Z][a-z0-9]+)*[A-Z][a-z0-9]*$/

Should be something like, Look for a Capital Letter, then at least one small case or number, or more, as well as it looks like it handles just a capital letter as that seems to be required, but the additional letters are optional.

Good luck

aleung
  • 9,848
  • 3
  • 55
  • 69
onaclov2000
  • 5,741
  • 9
  • 40
  • 54
1
^[A-Z]{1,2}([a-z]+[A-Z]{0,2})*$

This allows 2 consecutive capital characters (which is generally accepted, but unluckily PascalCase is not a spec).

alianos-
  • 886
  • 10
  • 21
0

Although the original post specifically excluded two consecutive capital (uppercase) letters, I'd like to post the regex for PascalCase that will answer many comments raised:

  • Allowing two consecutive capital letters
  • Allowing digits (but not as the leading character in the string)
  • Allowing a string ending with a capital letter or a digit

The regex is ^[A-Z][a-z0-9]*(?:[A-Z][a-z0-9]*)*(?:[A-Z]?)$

When tested against all strings raised in all comments, the following match as PascalCase:

PascalCase
Pascal2Case
PascalCaseA
Pascal2CaseA
ModeA
Mode2A
Mode2A2
Mode2A2A
CreateAMode
CreateBMode
MadeEasy
WonderFul
AndMe
Context
HTMLParser
HtmlParser
H
AaA
HELLO

The following do not match as PascalCase:

camelCase
2PascalCase
Moshe Rubin
  • 1,944
  • 1
  • 17
  • 37
0

Lower Camel Case - no digits allowed


    ^[a-z][a-z]*(([A-Z][a-z]+)*[A-Z]?|([a-z]+[A-Z])*|[A-Z])$
    

Test Cases: https://regex101.com/library/4h7A1I

Lower Camel Case - digits allowed


    ^[a-z][a-z0-9]*(([A-Z][a-z0-9]+)*[A-Z]?|([a-z0-9]+[A-Z])*|[A-Z])$

Test Cases: https://regex101.com/library/8nQras

Pascal Case - no digits allowed


    ^[A-Z](([a-z]+[A-Z]?)*)$

Test Cases: https://regex101.com/library/sF2jRZ

Pascal Case - digits allowed


    ^[A-Z](([a-z0-9]+[A-Z]?)*)$

Test Cases: https://regex101.com/library/csrkQw

For more details on camel case and pascal case check out this repo.

rouble
  • 16,364
  • 16
  • 107
  • 102