regular expression replace 2 characters with one

Question

i would like to use a regular expression for the following problem:

SOME_RANDOM_TEXT

should be converted to:

someRandomText

so, the _(any char) should be replaced with just the letter in upper case. i found something like that, using the tool:

_\w and $&

how to get only the second letter from the replacement?? any advice? thanks.

Some languages have flags (eg. \u to convert adjacent backreference to lowercase) that let you modify backreferences, I don't know if you have that in java — Asad Saeeduddin, Oct 22 '12 at 11:01
If regex were an option, you would use `_([A-Za-z])` or `_(\p{L})` in the regex and `$1` in the replacement string. The parentheses capture the letter in group #1 (assuming it's the first set of parens), and `$1` acts as a placeholder for that group in the replacement string. (`\w` is incorrect because it matches digits and the underscore in addition to letters, and `\p{L}` is more correct than `[A-Za-z]` because it matches Unicode letters, not just ASCII.) — Alan Moore, Oct 22 '12 at 12:12

Brian Agnew · Accepted Answer · 2012-10-22T11:25:24.253

3

It might be easier simply to String.split("_") and then rejoin, capitalising the first letter of each string in your collection.

Note that Apache Commons has lots of useful string-related stuff, including a join() method.

edited Oct 22 '12 at 11:25

answered Oct 22 '12 at 10:59

Brian Agnew

268,207
37
334
440

thanks, i do not really understand how the lookbehind should work with the replacment part($&)... – Oct 22 '12 at 11:14
I've removed that reference to lookbehinds since the above approach is simpler/less prone to errors etc. – Brian Agnew Oct 22 '12 at 11:25

score 1 · Answer 2 · edited May 23 '17 at 10:24

1

The problem is that the case conversion from lowercase to uppercase is not supported by Java.util.regex.Pattern This means you will need to do the conversion programmatically as Brian suggested. See also this thread

edited May 23 '17 at 10:24

Community

1
1

answered Oct 22 '12 at 11:34

cooltea

1,113
7
16

why this: "(?<=_)(\w)" and not that: "_\w"? – Oct 22 '12 at 11:36
1

Please again note that the regular expression is probably not what you are looking for. But FYI, `_\w` matches both the underscore and the alphabetical character that follows whereas `(?<=_)(\w)` matches only the alphabetical character. – cooltea Oct 22 '12 at 11:37
Yes, `_\w` consumes the underscore, but so what? You're not keeping it anyway, and you're *capturing* the letter. Even if you could use a regex for this, lookbehinds would be irrelevant. (And just so you know, `\w` matches underscores and digits as well as letters.) – Alan Moore Oct 22 '12 at 11:56
Well noted Alan, especially since the replacement removes the underscore which is needed here. – cooltea Oct 22 '12 at 12:24

score -2 · Answer 3 · answered Oct 22 '12 at 11:08

-2

You can also write a simple method to do this. It's more complicated but more optimized :

public static String toCamelCase(String value) {
    value = value.toLowerCase();
    byte[] source = value.getBytes();
    int maxLen = source.length;
    byte[] target = new byte[maxLen];
    int targetIndex = 0;

    for (int sourceIndex = 0; sourceIndex < maxLen; sourceIndex++) {
        byte c = source[sourceIndex];
        if (c == '_') {
            if (sourceIndex < maxLen - 1)
                source[sourceIndex + 1] = (byte) Character.toUpperCase(source[sourceIndex + 1]);
            continue;
        }

        target[targetIndex++] = source[sourceIndex];
    }

    return new String(target, 0, targetIndex);
}

I like Apache commons libraries, but sometimes it's good to know how it works and be able to write some specific code for jobs like this.

answered Oct 22 '12 at 11:08

DayS

1,561
11
15

1

@immerhart It works only if the default encoding is single byte per character, such as ASCII, ISO-8859-1 or similar. It fails for any Unicode encoding. But most of all, there is absolutely no reason to design like this and the correct solution would use the same amount of code. – Marko Topolnik Oct 22 '12 at 11:19
@immerhart This solution can be easily fixed, if you think you'll need it. – Marko Topolnik Oct 22 '12 at 11:21
and how? instead of using byte[], char[]? – Oct 22 '12 at 11:22
Oops, my bad. You're right, I assumed he just needed single-byte characters. Also, using toCharArray is, indeed, much better. – DayS Oct 22 '12 at 11:24
@DayS If you fix it, notify Brian Agnew, he'll probably remove the downvote in that case. – Marko Topolnik Oct 22 '12 at 11:25
1

You don't even need `toCharArray()`; just iterate through the original string with `charAt()` and use a StringBuilder to create the new one. – Alan Moore Oct 22 '12 at 11:43

regular expression replace 2 characters with one

3 Answers3