1

I have regex="" and a String str="stackoveflow";

I don't understand why it is matching every character in the string. can you explain to me?

public class test {

    public static void main(String[] args){
        Console console = System.console();
        String str="stackoveflow";          
        Pattern pattern = Pattern.compile("");
        Matcher matcher = pattern.matcher(str);
        while (matcher.find()) {
        console.format("I found the text" +
            " \"%s\" starting at " +
            "index %d and ending at index %d.%n",
            matcher.group(),
            matcher.start(),
            matcher.end());     
        }
    }
}

Output is:

I found the text "" starting at index 0 and ending at index 0.
I found the text "" starting at index 1 and ending at index 1.
I found the text "" starting at index 2 and ending at index 2.
I found the text "" starting at index 3 and ending at index 3.
I found the text "" starting at index 4 and ending at index 4.
I found the text "" starting at index 5 and ending at index 5.
I found the text "" starting at index 6 and ending at index 6.
I found the text "" starting at index 7 and ending at index 7.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.
I found the text "" starting at index 10 and ending at index 10.
I found the text "" starting at index 11 and ending at index 11.
I found the text "" starting at index 12 and ending at index 12.
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
sevenpi
  • 23
  • 6
  • Because `e*` matches an empty string as well - so it doesn't match every character, but the empty string in front of every character. – Sebastian Proske Aug 25 '18 at 09:13
  • 4
    An empty regex matches at every position in the input string. So for a string of 12 characters it "fits" at 13 positions. – rustyx Aug 25 '18 at 09:20
  • It’s not Java, but related to your question: [The empty regular expression](http://2ality.com/2012/09/empty-regexp.html). – Ole V.V. Aug 25 '18 at 09:43
  • 2
    It’s a convention that the empty string matches the empty string. For most purposes it is the most practical. No matter which decision had been taken, I’m sure some would be surprised sometimes. The string from index 0 to index 0 *is* the empty string, so to me it’s quite logical. – Ole V.V. Aug 25 '18 at 09:49
  • When you changed the `Pattern.compile("e*");` to `Pattern.compile("");` you changed its meaning entirely. But it turns out that the answer is almost the same. – Stephen C Aug 25 '18 at 09:56
  • @rustyx, There are only 12 positions in the string, what is that extra 13th match? – sevenpi Aug 25 '18 at 11:42
  • @All, There are only 12 positions in the string, what is that extra 13th match? the string is a set of 12 characters, where is the empty string hiding? where is it sitting there is no place? I am sorry, i could not understand – sevenpi Aug 25 '18 at 11:47
  • There are 12 characters. You can think of it this way: There’s an ampty stirng before the first character, there’s an empty string between each pair of characters, that’s 11, and there’s an empty strnig after the last one. Total is 13. See [Fencepost error](https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error). – Ole V.V. Aug 26 '18 at 14:36
  • @OleV.V. thanks it helps so much, so when i think of any string I should consider that it contains empty strings like as you described. can you give me any reference/online that explained concepts like you said. – sevenpi Aug 26 '18 at 15:16
  • If the link I already gave isn’t enough, search for [fencepost programming] or similar and get lots of hits. Or use search terms that better match what you are uncertain about. – Ole V.V. Aug 26 '18 at 15:25
  • 1
    @OleV.V. thanks. To your surprise, I get it here but nowhere that a string actually contain empty strings as you explained. – sevenpi Aug 26 '18 at 15:29

1 Answers1

2

Pattern("") matches a string consisting of zero characters. You can find one of those at every position in the string.

Note: if you changed find to match, you should find that there are no matches. (With match the pattern needs to match the entire input, and the entire input does not match a sequence of zero characters.)


Before you edited the question, your pattern was Pattern("e*"). That means zero or more repetitions of the character 'e'. By the logic above, you can "find" one of those at every character position in the input.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • There are only 12 positions in the string, what is that extra 13th match? the string is a set of 12 characters, where is the empty string hiding where is it sitting there is no place? I am sorry, i could not understand. – sevenpi Aug 25 '18 at 11:46
  • 2
    There are actually 13 position if you count the position after the last character. That is what the Matcher is doing. Clearly. – Stephen C Aug 25 '18 at 11:55