50

How do I find all PascalCased words in a document with a regular expression?

If you don't know the word Pascal cased, I'm only concerned with leading Upper camel case (i.e., camel cased words in which the first letter is capitalized).

Jonathan Simonney
  • 585
  • 1
  • 11
  • 25
Tom Lehman
  • 85,973
  • 71
  • 200
  • 272
  • 3
    I just want to point out that what you are describing is PascalCase. CamelCase refers specifically to words where the first letter is lowercase and all subsequent words begin with an uppercase letter. – Jens Bodal Nov 01 '17 at 22:01
  • Definitely PascalCase this is camelCase – Jacob Dec 29 '18 at 05:56

9 Answers9

75
([A-Z][a-z0-9]+)+

Assuming English. Use appropriate character classes if you want it internationalizable. This will match words such as "This". If you want to only match words with at least two capitals, just use

([A-Z][a-z0-9]+){2,}

UPDATE: As I mentioned in a comment, a better version is:

[A-Z]([A-Z0-9]*[a-z][a-z0-9]*[A-Z]|[a-z0-9]*[A-Z][A-Z0-9]*[a-z])[A-Za-z0-9]*

It matches strings that start with an uppercase letter, contain only letters and numbers, and contain at least one lowercase letter and at least one other uppercase letter.

edgerunner
  • 14,873
  • 2
  • 57
  • 69
Adam Crume
  • 15,614
  • 8
  • 46
  • 50
  • 1
    What about words with a subsequence of uppercase characters or ending with an uppercase character? – ephemient Jul 15 '09 at 02:04
  • 2
    If you want to match only words with more than one uppercase character, it'd be something like this: ([A-Z][a-z0-9]*){2,} – Adam Crume Jul 15 '09 at 12:43
  • 1
    Right, but that matches all-uppercase words too, which (IMO) shouldn't be considered CamelCase. – ephemient Jul 15 '09 at 14:27
  • 2
    Okay, then: [A-Z]([A-Z0-9]*[a-z][a-z0-9]*[A-Z]|[a-z0-9]*[A-Z][A-Z0-9]*[a-z])[A-Za-z0-9]* It matches strings that start with an uppercase letter, contain only letters and numbers, and contain at least one lowercase letter and at least one other uppercase letter. – Adam Crume Jul 15 '09 at 17:50
19

Lower camel case

this regex includes number and implements strict lower camel case as defined by the Google Java Style Guide regex validation.

[a-z]+((\d)|([A-Z0-9][a-z0-9]+))*([A-Z])?
  1. The first character is lower case.
  2. The following elements are either a single number or a upper case character followed by lower cases characters.
  3. The last character can be an upper case one.

Here is a snippet illustrating this regex. The following elements are valid.

xmlHttpRequest
newCustomerId
innerStopwatch
supportsIpv6OnIos
youTubeImporter
youtubeImporter
affine3D

Upper camel case

Same principle as the one used for lower camel case with always a starting upper case character.

([A-Z][a-z0-9]+)((\d)|([A-Z0-9][a-z0-9]+))*([A-Z])?

Here is a snippet illustrating this regex. The following elements are valid.

XmlHttpRequest
NewCustomerId
InnerStopwatch
SupportsIpv6OnIos
YouTubeImporter
YoutubeImporter
Affine3D
Nicolas Henneaux
  • 11,507
  • 11
  • 57
  • 82
6

The regexp that solved my problem (properly naming directories that will be recognized by FitNesse DbFit web service) is:

(^[A-Z][a-z0-9]+[A-Z]$)|(^[A-Z][a-z0-9]+([A-Z][a-z0-9]+)+$)|(^[A-Z][a-z0-9]+([A-Z][a-z0-9]+)+[A-Z]$) 

I reverse engineered these particular CamelCase rules, they are:

1. First character uppercase alpha
2. Next 1-n characters lowercase alphanumeric
3. Next character (n+1) uppercase alpha
4. Next 0 or more characters lowercase alphanumeric
No consecutive uppercase; no special characters.
Pattern may be repeated, e.g. NoChildLeftBehindSuite9102

The expression passed my testing as follows:

Camel01C is CamelCase syntax
Camel01c01 is not CamelCase syntax
Camel01C01 is CamelCase syntax
Camel01CC01 is not CamelCase syntax
Camel0a1c1 is not CamelCase syntax
Camel0a1C1 is CamelCase syntax
Camel0ac1b1C1 is CamelCase syntax
CamelC is CamelCase syntax
CamelC1 is CamelCase syntax
CamelCA is not CamelCase syntax
CamelCa1 is CamelCase syntax
CamelCa_1 is not CamelCase syntax
IbsReleaseTestVerificationRegressionSuite is CamelCase syntax
IbsReleaseTestVerificationRegressioNSuite is not CamelCase syntax
IbsReleaseTestVerificationRegressioN is CamelCase syntax
Billy Baroo
  • 61
  • 1
  • 2
5

Adam Crume's regex is close, but won't match for example IFoo or HTTPConnection. Not sure about the others, but give this one a try:

\b[A-Z][a-z]*([A-Z][a-z]*)*\b

The same caveats as for Adam's answer regarding digits, I18N, underscores etc.

You can test it out here.

Vinay Sajip
  • 95,872
  • 14
  • 179
  • 191
2

This seems to do it:

/^[A-Z][a-z]+([A-Z][a-z]+)+/

I've included Ruby unit tests:

require 'test/unit'

REGEX = /^[A-Z][a-z]+([A-Z][a-z]+)+/

class RegExpTest < Test::Unit::TestCase
  # more readable helper
  def self.test(name, &block)
    define_method("test #{name}", &block)
  end

  test "matches camelcased word" do
    assert 'FooBar'.match(REGEX)
  end

  test "does not match words starting with lower case" do
    assert ! 'fooBar'.match(REGEX)
  end

  test "does not match words without camel hump" do
    assert ! 'Foobar'.match(REGEX)
  end

  test "matches multiple humps" do
    assert 'FooBarFizzBuzz'.match(REGEX)
  end
end
nakajima
  • 1,862
  • 12
  • 12
1
([A-Z][a-z\d]+)+

Should do the trick for upper camel case. You can add leading underscores to it as well if you still want to consider something like _IsRunning upper camel case.

ahawker
  • 3,306
  • 24
  • 23
1

Just modified one of @AdamCrume's proposals:

([A-Z]+[a-z0-9]+)+

This will match IFrame, but not ABC. Other camel-cased words are matched, e.g. AbcDoesWork, and most importantly, it also matches simple words that do not have at least another capitalized letter, e.g. Frame.

What do you think of this version? Am I missing some important case?

logc
  • 3,813
  • 1
  • 18
  • 29
0

([a-z0-9]+|[A-Z0-9]+[a-z0-9]*|[A-Z0-9][a-z0-9]*([A-Z0-9][a-z0-9]*)*)

java regex to match string on camel case.

Mahesh Yadav
  • 378
  • 3
  • 6
0

Pascal Case - no digits allowed


    ^[A-Z](([a-z]+[A-Z]?)*)$

Test Cases: https://regex101.com/library/sF2jRZ

Pascal Case - digits allowed


    ^[A-Z](([a-z0-9]+[A-Z]?)*)$

Test Cases: https://regex101.com/library/csrkQw

Pascal Case - digits allowed - Upto 3 upper case letters

To support 2-3 letter capitalized acronyms such as IOStream, StreamIO, DeviceID, deviceID, AwsVPC, awsVPC, serialNO, SerialNO, deviceSN, DeviceSN. This variation is inspired by Microsoft's Capitalization Convention.


    ^[A-Z](([A-Z]{1,2}[a-z0-9]+)+([A-Z]{1,3}[a-z0-9]+)*[A-Z]{0,3}|([a-z0-9]+[A-Z]{0,3})*|[A-Z]{1,2})$

Test Cases: https://regex101.com/library/TLTXbK

For more details on camel case and pascal case check out this repo.

rouble
  • 16,364
  • 16
  • 107
  • 102