Using preg_replace() to convert alphanumeric strings from camelCase to snake_case

Question

I have a method now that will convert my camel case strings to snake case, but it's broken into three calls of preg_replace():

public function camelToUnderscore($string, $us = "-")
{
    // insert hyphen between any letter and the beginning of a numeric chain
    $string = preg_replace('/([a-z]+)([0-9]+)/i', '$1'.$us.'$2', $string);
    // insert hyphen between any lower-to-upper-case letter chain
    $string = preg_replace('/([a-z]+)([A-Z]+)/', '$1'.$us.'$2', $string);
    // insert hyphen between the end of a numeric chain and the beginning of an alpha chain
    $string = preg_replace('/([0-9]+)([a-z]+)/i', '$1'.$us.'$2', $string);

    // Lowercase
    $string = strtolower($string);

    return $string;
}

I wrote tests to verify its accuracy, and it works properly with the following array of inputs (array('input' => 'output')):

$test_values = [
    'foo'       => 'foo',
    'fooBar'    => 'foo-bar',
    'foo123'    => 'foo-123',
    '123Foo'    => '123-foo',
    'fooBar123' => 'foo-bar-123',
    'foo123Bar' => 'foo-123-bar',
    '123FooBar' => '123-foo-bar',
];

I'm wondering if there's a way to reduce my preg_replace() calls to a single line which will give me the same result. Any ideas?

NOTE: Referring to this post, my research has shown me a preg_replace() regex that gets me almost the result I want, except it doesn't work on the example of foo123 to convert it to foo-123.

@AdrienLeber read the bottom line of my question. It is not a duplicate. I read that post, and it did not help with my question. — Matt, Nov 09 '16 at 19:04
Sorry, I deleted the duplicate flag and posted a new answer based on what was shared on the post you refer on your question. — JazZ, Nov 09 '16 at 20:29
@Matt Yup, little too quick on the trigger there, my apologies. — Pieter van den Ham, Nov 09 '16 at 23:03
It is more common for researchers to be searching for "snake_case", but your [mcve] is seeking "kebab-case" ...the strings are being "skewered". — mickmackusa, Mar 09 '23 at 16:25

anubhava · Accepted Answer · 2016-11-09T19:20:30.983

28

You can use lookarounds to do all this in a single regex:

function camelToUnderscore($string, $us = "-") {
    return strtolower(preg_replace(
        '/(?<=\d)(?=[A-Za-z])|(?<=[A-Za-z])(?=\d)|(?<=[a-z])(?=[A-Z])/', $us, $string));
}

RegEx Demo

Code Demo

RegEx Description:

(?<=\d)(?=[A-Za-z])  # if previous position has a digit and next has a letter
|                    # OR
(?<=[A-Za-z])(?=\d)  # if previous position has a letter and next has a digit
|                    # OR
(?<=[a-z])(?=[A-Z])  # if previous position has a lowercase and next has a uppercase letter

edited Nov 09 '16 at 19:20

answered Nov 09 '16 at 19:10

anubhava

761,203
64
569
643

Great solution but be aware of code injection while using `preg_replace`. Above solution will inject code injection vulnerability. – Sanket Gandhi Mar 10 '23 at 04:40

JazZ · Answer 2 · 2016-11-10T12:12:38.800

Here is my two cents based on the duplicated post I flagged earlier. The accepted solution here is awesome. I just wanted to try to solve it with what was shared :

function camelToUnderscore($string, $us = "-") {
    return strtolower(preg_replace('/(?<!^)[A-Z]+|(?<!^|\d)[\d]+/', $us.'$0', $string));
}

Example :

Array
(
    [0] => foo
    [1] => fooBar
    [2] => foo123
    [3] => 123Foo
    [4] => fooBar123
    [5] => foo123Bar
    [6] => 123FooBar
)

foreach ($arr as $item) {
    echo camelToUnderscore($item);
    echo "\r\n";
}

Output :

foo
foo-bar
foo-123
123-foo
foo-bar-123
foo-123-bar
123-foo-bar

Explanation :

(?<!^)[A-Z]+      // Match one or more Capital letter not at start of the string
|                 // OR
(?<!^|\d)[\d]+    // Match one or more digit not at start of the string

$us.'$0'          // Substitute the matching pattern(s)

online regex

The question is already solved so I won't say that I hope it helps but maybe someone will find this useful.

EDIT

There are limits with this regex :

foo123bar => foo-123bar
fooBARFoo => foo-barfoo

Thanks to @urban for pointed it out. Here is his link with tests with the three solutions posted on this question :

three solutions demo

Your solution is different from the OP solution: it doesn't take in account the case `foo123bar`... See [code demo](http://ideone.com/Mr9wN5) the difference between OP's solution, anubhava's solution and your solution. — Urban, Nov 10 '16 at 10:14
@urban `foo123bar` is not camelCase. But you're right there are limits with this regex and it's not the best solution... Something like `fooBARFoo` will produce `foo-barfoo`. Anyway, this will for basics camelCase. I edit the answer. Thanks for your feedback ! — JazZ, Nov 10 '16 at 11:53

score 2 · Answer 3 · answered Nov 09 '16 at 19:08

From a colleague:

$string = preg_replace(array($pattern1, $pattern2), $us.'$1', $string); might work

My solution:

public function camelToUnderscore($string, $us = "-")
{
    $patterns = [
        '/([a-z]+)([0-9]+)/i',
        '/([a-z]+)([A-Z]+)/',
        '/([0-9]+)([a-z]+)/i'
    ];
    $string = preg_replace($patterns, '$1'.$us.'$2', $string);

    // Lowercase
    $string = strtolower($string);

    return $string;
}

mickmackusa · Answer 4 · 2023-03-09T16:51:01.043

You don't need to suffer the inefficiency of loads of lookarounds or multiple sets of patterns to target the positions between words or consecutive numbers.

Use greedy matching to find the desired sequences, then reset the fullstring match with \K, then check that the position is not the end of the string. Everything that qualifies should receive the delimiting character. The speed in this greedy pattern is in the fact that it consumes one or more sequences and never looks back.

I'll omit the strtolower() call from my answer because it is merely noise for the challenge.

Code: (Demo)

preg_replace(
    '/(?:\d++|[A-Za-z]?[a-z]++)\K(?!$)/',
    '-',
    $tests
)

Processing between words/numbers:

User	steps	pattern	replacement
Anubhava	660	`/(?<=\d)(?=[A-Za-z])\|(?<=[A-Za-z])(?=\d)\|(?<=[a-z])(?=[A-Z])`	`'-'`
mickmackusa	337	`/(?:\d++\|[A-Za-z]?[a-z]++)\K(?!$)/`	`'-'`

Strict camelCase processing:

User	steps	pattern	replacement
JazZ	321	`/(?<!^)[A-Z]+\|(?<!^\|\d)[\d]+/`	`'-$0'`
mickmackusa	250	`/(?>\d+\|[A-Z][a-z]*\|[a-z]+)(?!$)/`	`'$0-'`
mickmackusa	244	`/(?:\d++\|[a-z]++)\K(?!$)/`	`'-'`

I have discounted @Matt's answer because it is making three whole passes over each string -- it isn't even in the same ballpark in terms of efficiency.

Using preg_replace() to convert alphanumeric strings from camelCase to snake_case

4 Answers4