13

I'm confused with a code

public class StringReplaceWithEmptyString 
{
    public static void main(String[] args) 
    {
        String s1 = "asdfgh";
        System.out.println(s1);
        s1 = s1.replace("", "1");
        System.out.println(s1); 
    }
}

And the output is:

asdfgh
1a1s1d1f1g1h1

So my first opinion was every character in a String is having an empty String "" at both sides. But if that's the case after 'a' (in the String) there should be two '1' coming in the second line of output (one for end of 'a' and second for starting of 's').

Now I checked whether the String is represented as a char[] in these links In Java, is a String an array of chars? and String representation in Java I got answer as YES.

So I tried to assign an empty character '' to a char variable, but its giving me a compiler error,

Invalid character constant

The same process gives a compiler error when I tried in char[]

char[] c = {'','a','','s'};  // CTE

So I'm confused about three things.

  1. How an empty String is represented by char[] ?
  2. Why I'm getting that output for the above code?
  3. How the String s1 is represented in char[] when it is initialized first time?

Sorry if I'm wrong at any part of my question.

Arun Sudhakaran
  • 2,167
  • 4
  • 27
  • 52
  • 3
    I would have expected this from `String#replaceAll` but not from `String#replace` – Tim Biegeleisen Jan 27 '17 at 07:33
  • 2
    @TimBiegeleisen just tested and it actually produces that result with just `String#replace` – Enigo Jan 27 '17 at 07:36
  • 2
    "*how an empty String is represented*" - `char[] empty = {};` –  Jan 27 '17 at 07:36
  • No...I completely believe you, I'm just at a loss to explain. – Tim Biegeleisen Jan 27 '17 at 07:36
  • 1
    @Tim it's because `replace(a, b)` is implemented using (something equivalent to) `replaceAll(quotePattern(a), quoteAsReplacement(b))` internally. Since `quotePattern("")` is `""`, it's going to do the same. – Andy Turner Jan 27 '17 at 07:38
  • ya that's right **@a_horse_with_no_name**. When I called **toCharArray()** on empty String and called toString method on that array,output came as **[]**. Thanks – Arun Sudhakaran Jan 27 '17 at 07:41
  • 7
    IMO asking how many empty Strings are in a String is sort of like dividing by zero. – D M Jan 27 '17 at 07:45
  • I think that's just how regex works in Java... – Sweeper Jan 27 '17 at 07:48
  • Interesting enough, I did not find the description for the matching rules of empty patterns in the [Pattern](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#compile-java.lang.String-) or [Matcher](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html) JavaDocs. – Hulk Jan 27 '17 at 08:22
  • @Sweeper The thing is that the method the OP mentioned (`String.replace(CharSequence, CharSequence)`) is not supposed to do the matching using regexs, at least according to its JavaDoc. But good point highlighting that regexs work differently in different languages, which is the main reason why I try to avoid regex as much as possible. – SantiBailors Jan 27 '17 at 08:22
  • 3
    I'm surprised that it inserts exactly one `1` rather than zero. After all, `"ab" == "a" + "" + "b" == "a" + "" + "" + "b"` etc; so saying that there is *1* empty string between them seems... arbitrary. – Andy Turner Jan 27 '17 at 08:41
  • A simple explanation is that when you try to replace all the blank space characters into the `String`, the `iterator` starts traversing through the `String`. But both at physics and in java if you "zoom in" between two thing you will find that there is something separating them appart (the smallest building block) as in physics is the atom for example in our case is the blank space that can be found also on the sides of the `String` as well. – Lazar Lazarov Jan 27 '17 at 09:03

3 Answers3

7

Just adding some more explanation to Tim Biegeleisen answer.

As of Java 8, The code of replace method in java.lang.String class is

public String replace(CharSequence target, CharSequence replacement) {
        return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
                this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

Here You can clearly see that the string is replaced by Regex Pattern matcher and in regex "" is identified by Zero-Length character and it is present around any Non-Zero length character.

So, behind the scene your code is executed as following

Pattern.compile("".toString(), Pattern.LITERAL).matcher("asdfgh").replaceAll(Matcher.quoteReplacement("1".toString()));

The the output becomes

1a1s1d1f1g1h1
Avinash
  • 4,115
  • 2
  • 22
  • 41
6

Going with Andy Turner's great comment, your call to String#replace() is actually implemented using String#replaceAll(). As such, there is a regex replacement happening here. The matches occurs before the first character, in between each character in the string, and after the last character.

^|a|s|d|f|g|h|$
 ^ this and every pipe matches to empty string ""

The match you are making is a zero length match. In Java's regex implementation used in String.replaceAll(), this behaves as the example above shows, namely matching each inter-character position and the positions before the first and after the last characters.

Here is a reference which discusses zero length matches in more detail: http://www.regexguru.com/2008/04/watch-out-for-zero-length-matches/

A zero-width or zero-length match is a regular expression match that does not match any characters. It matches only a position in the string. E.g. the regex \b matches between the 1 and , in 1,2.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 1
    I think we all can see that, but the question is **why** does it happen? What would be the purpose of having an empty `String` in between the characters? Or perhaps this is a fault of `String#replace()` method? – Dth Jan 27 '17 at 07:45
  • So **@Tim Biegeleisen** what will be the char[] representation of String s1, when it is first initialized – Arun Sudhakaran Jan 27 '17 at 07:45
  • But isn't the method in subject (`String.replace(CharSequence, CharSequence)`) actually supposed to do the match without using regexes, according to its JavaDoc ? The `replaceAll` method does use regexes and it states so in its JavaDoc, but `replace(CharSequence, CharSequence)` doesn't mention regexes anywhere, it actually says _Replaces each substring of this string that matches the **literal** target sequence..._ . That's what leaves me confused. – SantiBailors Jan 27 '17 at 08:30
  • 2
    @SantiBailors Yes, this is what I thought too. Andy Turner, who is a Java architect at Google, seems to think that `replaceAll` is being used under the hood. In any case, the behavior is of a zero length regex replacement. – Tim Biegeleisen Jan 27 '17 at 08:32
  • It's true, `replaceAll` is being used under the hood. I think the JavaDoc of `replace(CharSequence, CharSequence)` should be fixed. – SantiBailors Jan 27 '17 at 08:36
  • but **@Tim Biegeleisen** here [http://stackoverflow.com/questions/3164452/regex-for-specifying-an-empty-string] it says that regex for empty string is **^$** – Arun Sudhakaran Jan 27 '17 at 09:52
  • @ArunSudhakaran That is a regex for matching nothing, empty double quotes will cause a zero-length match. – Tim Biegeleisen Jan 27 '17 at 09:55
  • but **"".matches("^$")** returns true, and I didn't understand what is this zero-length match – Arun Sudhakaran Jan 27 '17 at 10:05
  • 2
    `^$` is saying the _entire_ string has nothing in it, hence empty string alone matches. Searching for `""` unbounded means find boundaries between every character in the string. Makes sense? – Tim Biegeleisen Jan 27 '17 at 10:10
2

This is because it does a regex match of the pattern/replacement you pass to the replace().

 public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
     this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
 }

Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".

Parameters:

target The sequence of char values to be replaced

replacement The replacement sequence of char values

Returns: The resulting string

Throws: NullPointerException if target or replacement is null.

Since: 1.5

Please read more at the link below ... (Also browse through the source code).

http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.replace%28java.lang.CharSequence%2Cjava.lang.CharSequence%29

A regex such as "" would match every possible empty string in a string. In this case it happens to be every empty space at the start and end and after every character in the string.

  • Which method does the doc in your quote refer to (_Splits this string around matches of the given regular expression..._) ? – SantiBailors Jan 27 '17 at 08:56
  • 1
    Sorry, wrong (hasty) copy and paste. Correcting. –  Jan 27 '17 at 09:00
  • 1
    Thanks, so now it's the JavaDoc of `replace(CharSequence, CharSequence)`. This highlights the problem (in my opinion): the doc of the method does not mention matching with regex at all, while the doc of `replaceAll` does. – SantiBailors Jan 27 '17 at 09:07