Split camelCase word into words with php preg_match (Regular Expression)

Question

How would I go about splitting the word:

oneTwoThreeFour

into an array so that I can get:

one Two Three Four

with preg_match ?

I tired this but it just gives the whole word

$words = preg_match("/[a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/", $string, $matches)`;

Maybe my question can help you, I asked same thing yesterday, but about Java https://stackoverflow.com/questions/4502273/breaking-strings-into-chars-that-are-in-upper-case — Gondim, Dec 23 '10 at 14:47

score 89 · Answer 1 · edited Oct 08 '18 at 17:14

89

You can use preg_split as:

$arr = preg_split('/(?=[A-Z])/',$str);

See it

I'm basically splitting the input string just before the uppercase letter. The regex used (?=[A-Z]) matches the point just before a uppercase letter.

edited Oct 08 '18 at 17:14

Cœur

37,241
25
195
267

answered Dec 23 '10 at 14:46

codaddict

445,704
82
492
529

score 84 · Accepted Answer · edited Nov 16 '17 at 18:16

84

You can also use preg_match_all as:

preg_match_all('/((?:^|[A-Z])[a-z]+)/',$str,$matches);

Explanation:

(        - Start of capturing parenthesis.
 (?:     - Start of non-capturing parenthesis.
  ^      - Start anchor.
  |      - Alternation.
  [A-Z]  - Any one capital letter.
 )       - End of non-capturing parenthesis.
 [a-z]+  - one ore more lowercase letter.
)        - End of capturing parenthesis.

edited Nov 16 '17 at 18:16

Mike W

425
3
12

answered Dec 23 '10 at 14:52

codaddict

445,704
82
492
529

Wouldn't the non-capturing group cause the result to be [one, wo, hree, our]? – Aaron J Lang Jan 16 '14 at 12:21
2

@AaronJLang no, because the outer parentheses capture the WHOLE group, including the sub-group. It's a sub-group that he doesn't want to clutter the $matches collection. – Eli Gassert Mar 20 '14 at 11:28
2

This failed for me with "TestID" using: "preg_match_all('/((?:^|[A-Z])[a-z]+)/', $key, $matches); die(implode(' ', $matches[0]));" because it doesn't like the CONSECUTIVE CAPS issue. I needed to split case changes with spaces and @blak3r's solution worked for me: http://stackoverflow.com/a/17122207/539149 – Zack Morris Sep 03 '15 at 20:30
1

Better solution for strings like `HTMLParser` that will work: http://stackoverflow.com/a/6572999/1697320. – Maciej Sz Oct 21 '16 at 11:38
As stipulated by @TarranJones (although not articulated too clearly), you don't need the outer-parenthesis. A matching string of `'/(?:^|[A-Z])[a-z]+/'`would suffice to produce one array (instead of two). This is because `preg_match_all()` automatically captures all instances of the match, without you having to specifically stipulate it. – cartbeforehorse Jan 31 '18 at 09:38

ridgerunner · Answer 3 · 2014-04-12T14:30:13.870

I know that this is an old question with an accepted answer, but IMHO there is a better solution:

<?php // test.php Rev:20140412_0800
$ccWord = 'NewNASAModule';
$re = '/(?#! splitCamelCase Rev:20140412)
    # Split camelCase "words". Two global alternatives. Either g1of2:
      (?<=[a-z])      # Position is after a lowercase,
      (?=[A-Z])       # and before an uppercase letter.
    | (?<=[A-Z])      # Or g2of2; Position is after uppercase,
      (?=[A-Z][a-z])  # and before upper-then-lower case.
    /x';
$a = preg_split($re, $ccWord);
$count = count($a);
for ($i = 0; $i < $count; ++$i) {
    printf("Word %d of %d = \"%s\"\n",
        $i + 1, $count, $a[$i]);
}
?>

Note that this regex, (like codaddict's '/(?=[A-Z])/' solution - which works like a charm for well formed camelCase words), matches only a position within the string and consumes no text at all. This solution has the additional benefit that it also works correctly for not-so-well-formed pseudo-camelcase words such as: StartsWithCap and: hasConsecutiveCAPS.

Input:

oneTwoThreeFour
StartsWithCap
hasConsecutiveCAPS
NewNASAModule

Output:

Word 1 of 4 = "one"
Word 2 of 4 = "Two"
Word 3 of 4 = "Three"
Word 4 of 4 = "Four"

Word 1 of 3 = "Starts"
Word 2 of 3 = "With"
Word 3 of 3 = "Cap"

Word 1 of 3 = "has"
Word 2 of 3 = "Consecutive"
Word 3 of 3 = "CAPS"

Word 1 of 3 = "New"
Word 2 of 3 = "NASA"
Word 3 of 3 = "Module"

Edited: 2014-04-12: Modified regex, script and test data to correctly split: "NewNASAModule" case (in response to rr's comment).

This is a much better solution, works first time (others added blank values to the array, this one is perfect! Thanks! +1 — Anil, May 08 '13 at 13:43
There seems to be a problem with strings like `NewNASAModule` (outputs: `[New, NASAModule]`; I'd expect `[New, NASA, Module]`) — rr-, Apr 12 '14 at 08:55
@rr - Yes you are correct. See my other updated answer which splits: `NewNASAModule` correctly: [RegEx to split camelCase or TitleCase (advanced)](http://stackoverflow.com/a/7599674/433790) — ridgerunner, Apr 12 '14 at 14:02
It doesn't cover cases with digits. For some reason other repliers also ignore this basic fact. E.g. "Css3Transform" or alike — Onkeltem, Aug 29 '19 at 16:01

rr- · Answer 4 · 2014-04-12T13:36:06.527

While ridgerunner's answer works great, it seems not to work with all-caps substrings that appear in the middle of sentence. I use following and it seems to deal with these just alright:

function splitCamelCase($input)
{
    return preg_split(
        '/(^[^A-Z]+|[A-Z][^A-Z]+)/',
        $input,
        -1, /* no limit for replacement count */
        PREG_SPLIT_NO_EMPTY /*don't return empty elements*/
            | PREG_SPLIT_DELIM_CAPTURE /*don't strip anything from output array*/
    );
}

Some test cases:

assert(splitCamelCase('lowHigh') == ['low', 'High']);
assert(splitCamelCase('WarriorPrincess') == ['Warrior', 'Princess']);
assert(splitCamelCase('SupportSEELE') == ['Support', 'SEELE']);
assert(splitCamelCase('LaunchFLEIAModule') == ['Launch', 'FLEIA', 'Module']);
assert(splitCamelCase('anotherNASATrip') == ['another', 'NASA', 'Trip']);

score 13 · Answer 5 · answered Jun 15 '13 at 09:38

A functionized version of @ridgerunner's answer.

/**
 * Converts camelCase string to have spaces between each.
 * @param $camelCaseString
 * @return string
 */
function fromCamelCase($camelCaseString) {
        $re = '/(?<=[a-z])(?=[A-Z])/x';
        $a = preg_split($re, $camelCaseString);
        return join($a, " " );
}

ArtisticPheonix · Answer 6 · 2012-02-02T00:09:51.237

$string = preg_replace( '/([a-z0-9])([A-Z])/', "$1 $2", $string );

The trick is a repeatable pattern $1 $2$1 $2 or lower UPPERlower UPPERlower etc.... for example helloWorld = $1 matches "hello", $2 matches "W" and $1 matches "orld" again so in short you get $1 $2$1 or "hello World", matches HelloWorld as $2$1 $2$1 or again "Hello World". Then you can lower case them uppercase the first word or explode them on the space, or use a _ or some other character to keep them separate.

Short and simple.

score 5 · Answer 7 · answered Apr 15 '19 at 01:54

When determining the best pattern for your project, you will need to consider the following pattern factors:

Accuracy (Robustness) -- whether the pattern is correct in all cases and is reasonably future-proof
Efficiency -- the pattern should be direct, deliberate, and avoid unnecessary labor
Brevity -- the pattern should use appropriate techniques to avoid unnecessary character length
Readability -- the pattern should be keep as simple as possible

The above factors also happen to be in the hierarchical order that strive to obey. In other words, it doesn't make much sense to me to prioritize 2, 3, or 4 when 1 doesn't quite satisfy the requirements. Readability is at the bottom of the list for me because in most cases I can follow the syntax.

Capture Groups and Lookarounds often impact pattern efficiency. The truth is, unless you are executing this regex on thousands of input strings, there is no need to toil over efficiency. It is perhaps more important to focus on pattern readability which can be associated with pattern brevity.

Some patterns below will require some additional handling/flagging by their preg_ function, but here are some pattern comparisons based on the OP's sample input:

preg_split() patterns:

/^[^A-Z]+\K|[A-Z][^A-Z]+\K/ (21 steps)
/(^[^A-Z]+|[A-Z][^A-Z]+)/ (26 steps)
/[^A-Z]+\K(?=[A-Z])/ (43 steps)
/(?=[A-Z])/ (50 steps)
/(?=[A-Z]+)/ (50 steps)
/([a-z]{1})[A-Z]{1}/ (53 steps)
/([a-z0-9])([A-Z])/ (68 steps)
/(?<=[a-z])(?=[A-Z])/x (94 steps) ...for the record, the x is useless.
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/ (134 steps)

preg_match_all() patterns:

/[A-Z]?[a-z]+/ (14 steps)
/((?:^|[A-Z])[a-z]+)/ (35 steps)

I'll point out that there is a subtle difference between the output of preg_match_all() and preg_split(). preg_match_all() will output a 2-dimensional array, in other words, all of the fullstring matches will be in the [0] subarray; if there is a capture group used, those substrings will be in the [1] subarray. On the other hand, preg_split() only outputs a 1-dimensional array and therefore provides a less bloated and more direct path to the desired output.

Some of the patterns are insufficient when dealing with camelCase strings that contain an ALLCAPS/acronym substring in them. If this is a fringe case that is possible within your project, it is logical to only consider patterns that handle these cases correctly. I will not be testing TitleCase input strings because that is creeping too far from the question.

New Extended Battery of Test Strings:

oneTwoThreeFour
hasConsecutiveCAPS
newNASAModule
USAIsGreatAgain

Suitable preg_split() patterns:

/[a-z]+\K|(?=[A-Z][a-z]+)/ (149 steps) *I had to use [a-z] for the demo to count properly
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/ (547 steps)

Suitable preg_match_all() pattern:

/[A-Z]?[a-z]+|[A-Z]+(?=[A-Z][a-z]|$)/ (75 steps)

Finally, my recommendations based on my pattern principles / factor hierarchy. Also, I recommend preg_split() over preg_match_all() (despite the patterns having less steps) as a matter of directness to the desired output structure. (of course, choose whatever you like)

Code: (Demo)

$noAcronyms = 'oneTwoThreeFour';
var_export(preg_split('~^[^A-Z]+\K|[A-Z][^A-Z]+\K~', $noAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+~', $noAcronyms, $out) ? $out[0] : []);

Code: (Demo)

$withAcronyms = 'newNASAModule';
var_export(preg_split('~[^A-Z]+\K|(?=[A-Z][^A-Z]+)~', $withAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+|[A-Z]+(?=[A-Z][^A-Z]|$)~', $withAcronyms, $out) ? $out[0] : []);

In the last pattern, you can change `(?=[A-Z][a-z]|$)` to `(?![a-z])`. — Casimir et Hippolyte, Jun 20 '23 at 23:54

score 3 · Answer 8 · answered Aug 31 '16 at 03:52

I took cool guy Ridgerunner's code (above) and made it into a function:

echo deliciousCamelcase('NewNASAModule');

function deliciousCamelcase($str)
{
    $formattedStr = '';
    $re = '/
          (?<=[a-z])
          (?=[A-Z])
        | (?<=[A-Z])
          (?=[A-Z][a-z])
        /x';
    $a = preg_split($re, $str);
    $formattedStr = implode(' ', $a);
    return $formattedStr;
}

This will return: New NASA Module

Kobi · Answer 9 · 2019-03-01T05:26:40.967

2

Another option is matching /[A-Z]?[a-z]+/ - if you know your input is on the right format, it should work nicely.

[A-Z]? would match an uppercase letter (or nothing). [a-z]+ would then match all following lowercase letters, until the next match.

Working example: https://regex101.com/r/kNZfEI/1

edited Mar 01 '19 at 05:26

answered Dec 23 '10 at 17:07

Kobi

135,331
41
252
292

Nice and lean - always prefer it this way. – benjaminhull Mar 15 '17 at 09:18
@jbobbins - Thank, updated. ideone expired old examples at some point, so many old examples are still broken. – Kobi Mar 01 '19 at 05:28
@Kobi thanks. just so you're aware, I pasted the assertion text from the post by rr- and the ones with multiple caps together don't work. https://regex101.com/r/kNZfEI/2 – jbobbins Mar 01 '19 at 14:50

score 1 · Answer 10 · answered May 10 '23 at 14:07

1

This one convert camelCase to sentences :

ucfirst(strtolower(implode(' ', preg_split('/(?=[A-Z])/', $camelCaseStr))));

"helloWorld" -> "Hello world"

answered May 10 '23 at 14:07

Paul Tru

11
1

score 0 · Answer 11 · answered Jul 17 '13 at 15:56

0

You can split on a "glide" from lowercase to uppercase thus:

$parts = preg_split('/([a-z]{1})[A-Z]{1}/', $string, -1, PREG_SPLIT_DELIM_CAPTURE);        
//PREG_SPLIT_DELIM_CAPTURE to also return bracketed things
var_dump($parts);

Annoyingly you will then have to rebuild the words from each corresponding pair of items in $parts

Hope this helps

answered Jul 17 '13 at 15:56

Daniel Rhodes

141
1
7

oops this will probably fail on the CONSECUTIVE CAPS issue – Daniel Rhodes Jul 18 '13 at 13:51

joronimo · Answer 12 · 2015-03-12T13:57:29.443

First of all codaddict thank you for your pattern, it helped a lot!

I needed a solution that works in case a preposition 'a' exists:

e.g. thisIsACamelcaseSentence.

I found the solution in doing a two step preg_match and made a function with some options:

/*
 * input: 'thisIsACamelCaseSentence' output: 'This Is A Camel Case Sentence'
 * options $case: 'allUppercase'[default] >> 'This Is A Camel Case Sentence'
 *                'allLowerCase'          >> 'this is a camel case sentence'
 *                'firstUpperCase'        >> 'This is a camel case sentence'
 * @return: string
 */

function camelCaseToWords($string, $case = null){
    isset($case) ? $case = $case : $case = 'allUpperCase';

    // Find first occurances of two capitals
    preg_match_all('/((?:^|[A-Z])[A-Z]{1})/',$string, $twoCapitals);

    // Split them with the 'zzzzzz' string. e.g. 'AZ' turns into 'AzzzzzzZ'
    foreach($twoCapitals[0] as $match){
        $firstCapital = $match[0];
        $lastCapital = $match[1];
        $temp = $firstCapital.'zzzzzz'.$lastCapital;
        $string = str_replace($match, $temp, $string);  
    }

    // Now split words
    preg_match_all('/((?:^|[A-Z])[a-z]+)/', $string, $words);

    $output = "";
    $i = 0;
    foreach($words[0] as $word){

            switch($case){
                case 'allUpperCase':
                $word = ucfirst($word);
                break;

                case 'allLowerCase': 
                $word = strtolower($word);
                break;

                case 'firstUpperCase':
                ($i == 0) ? $word = ucfirst($word) : $word = strtolower($word);
                break;                  
            }

            // remove te 'zzzzzz' from a word if it has
            $word = str_replace('zzzzzz','', $word);    
            $output .= $word." ";
            $i++;
    }
    return $output; 
}

Feel free to use it, and in case there is an 'easier' way to do this in one step please comment!

score 0 · Answer 13 · answered Oct 06 '18 at 02:09

0

Full function based on @codaddict answer:

function splitCamelCase($str) {
    $splitCamelArray = preg_split('/(?=[A-Z])/', $str);

    return ucwords(implode($splitCamelArray, ' '));
}

answered Oct 06 '18 at 02:09

guizo

2,594
19
26

Split camelCase word into words with php preg_match (Regular Expression)

13 Answers13

Input:

Output:

Linked

Related