13

I have a method now that will convert my camel case strings to snake case, but it's broken into three calls of preg_replace():

public function camelToUnderscore($string, $us = "-")
{
    // insert hyphen between any letter and the beginning of a numeric chain
    $string = preg_replace('/([a-z]+)([0-9]+)/i', '$1'.$us.'$2', $string);
    // insert hyphen between any lower-to-upper-case letter chain
    $string = preg_replace('/([a-z]+)([A-Z]+)/', '$1'.$us.'$2', $string);
    // insert hyphen between the end of a numeric chain and the beginning of an alpha chain
    $string = preg_replace('/([0-9]+)([a-z]+)/i', '$1'.$us.'$2', $string);

    // Lowercase
    $string = strtolower($string);

    return $string;
}

I wrote tests to verify its accuracy, and it works properly with the following array of inputs (array('input' => 'output')):

$test_values = [
    'foo'       => 'foo',
    'fooBar'    => 'foo-bar',
    'foo123'    => 'foo-123',
    '123Foo'    => '123-foo',
    'fooBar123' => 'foo-bar-123',
    'foo123Bar' => 'foo-123-bar',
    '123FooBar' => '123-foo-bar',
];

I'm wondering if there's a way to reduce my preg_replace() calls to a single line which will give me the same result. Any ideas?

NOTE: Referring to this post, my research has shown me a preg_replace() regex that gets me almost the result I want, except it doesn't work on the example of foo123 to convert it to foo-123.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Matt
  • 6,993
  • 4
  • 29
  • 50

4 Answers4

28

You can use lookarounds to do all this in a single regex:

function camelToUnderscore($string, $us = "-") {
    return strtolower(preg_replace(
        '/(?<=\d)(?=[A-Za-z])|(?<=[A-Za-z])(?=\d)|(?<=[a-z])(?=[A-Z])/', $us, $string));
}

RegEx Demo

Code Demo

RegEx Description:

(?<=\d)(?=[A-Za-z])  # if previous position has a digit and next has a letter
|                    # OR
(?<=[A-Za-z])(?=\d)  # if previous position has a letter and next has a digit
|                    # OR
(?<=[a-z])(?=[A-Z])  # if previous position has a lowercase and next has a uppercase letter
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Great solution but be aware of code injection while using `preg_replace`. Above solution will inject code injection vulnerability. – Sanket Gandhi Mar 10 '23 at 04:40
4

Here is my two cents based on the duplicated post I flagged earlier. The accepted solution here is awesome. I just wanted to try to solve it with what was shared :

function camelToUnderscore($string, $us = "-") {
    return strtolower(preg_replace('/(?<!^)[A-Z]+|(?<!^|\d)[\d]+/', $us.'$0', $string));
}

Example :

Array
(
    [0] => foo
    [1] => fooBar
    [2] => foo123
    [3] => 123Foo
    [4] => fooBar123
    [5] => foo123Bar
    [6] => 123FooBar
)

foreach ($arr as $item) {
    echo camelToUnderscore($item);
    echo "\r\n";
}

Output :

foo
foo-bar
foo-123
123-foo
foo-bar-123
foo-123-bar
123-foo-bar

Explanation :

(?<!^)[A-Z]+      // Match one or more Capital letter not at start of the string
|                 // OR
(?<!^|\d)[\d]+    // Match one or more digit not at start of the string

$us.'$0'          // Substitute the matching pattern(s)

online regex

The question is already solved so I won't say that I hope it helps but maybe someone will find this useful.


EDIT

There are limits with this regex :

foo123bar => foo-123bar
fooBARFoo => foo-barfoo

Thanks to @urban for pointed it out. Here is his link with tests with the three solutions posted on this question :

three solutions demo

JazZ
  • 4,469
  • 2
  • 20
  • 40
  • Your solution is different from the OP solution: it doesn't take in account the case `foo123bar`... See [code demo](http://ideone.com/Mr9wN5) the difference between OP's solution, anubhava's solution and your solution. – Urban Nov 10 '16 at 10:14
  • @urban `foo123bar` is not camelCase. But you're right there are limits with this regex and it's not the best solution... Something like `fooBARFoo` will produce `foo-barfoo`. Anyway, this will for basics camelCase. I edit the answer. Thanks for your feedback ! – JazZ Nov 10 '16 at 11:53
2

From a colleague:

$string = preg_replace(array($pattern1, $pattern2), $us.'$1', $string); might work

My solution:

public function camelToUnderscore($string, $us = "-")
{
    $patterns = [
        '/([a-z]+)([0-9]+)/i',
        '/([a-z]+)([A-Z]+)/',
        '/([0-9]+)([a-z]+)/i'
    ];
    $string = preg_replace($patterns, '$1'.$us.'$2', $string);

    // Lowercase
    $string = strtolower($string);

    return $string;
}
Matt
  • 6,993
  • 4
  • 29
  • 50
0

You don't need to suffer the inefficiency of loads of lookarounds or multiple sets of patterns to target the positions between words or consecutive numbers.

Use greedy matching to find the desired sequences, then reset the fullstring match with \K, then check that the position is not the end of the string. Everything that qualifies should receive the delimiting character. The speed in this greedy pattern is in the fact that it consumes one or more sequences and never looks back.

I'll omit the strtolower() call from my answer because it is merely noise for the challenge.

Code: (Demo)

preg_replace(
    '/(?:\d++|[A-Za-z]?[a-z]++)\K(?!$)/',
    '-',
    $tests
)

Processing between words/numbers:

User steps pattern replacement
Anubhava 660 /(?<=\d)(?=[A-Za-z])|(?<=[A-Za-z])(?=\d)|(?<=[a-z])(?=[A-Z]) '-'
mickmackusa 337 /(?:\d++|[A-Za-z]?[a-z]++)\K(?!$)/ '-'

Strict camelCase processing:

User steps pattern replacement
JazZ 321 /(?<!^)[A-Z]+|(?<!^|\d)[\d]+/ '-$0'
mickmackusa 250 /(?>\d+|[A-Z][a-z]*|[a-z]+)(?!$)/ '$0-'
mickmackusa 244 /(?:\d++|[a-z]++)\K(?!$)/ '-'

I have discounted @Matt's answer because it is making three whole passes over each string -- it isn't even in the same ballpark in terms of efficiency.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136