The important bit is that a regexp can never match twice at same position. The matches also cannot overlap. Furthermore, note that there are six possible positions involved in "hello"
: one at the start of each character, and one at the very end (see fenceposting).
When you start searching for /.*/
, there's a match at position 0, and it takes up five characters. This disqualifies positions 0, 1, 2, 3 and 4 from further matches (as they are part of the first match).
The second match starts matching at position 5, and finds a match for "0 or more characters" - namely, 0 characters. The position 5 is not contained in the first match, and so not disqualified by the "no overlap" rule.
When you anchor the start with /^.*/
, the position 5 becomes ineligible, as it is not the start.
When you anchor the end with /.*$/
, both position 0 and position 5 will detect that after their 5-character or 0-character match respectively they are at the end of the search string, and thus you still get both matches.
When you change the regexp to "1 or more characters" with /.+/
, then the position 5 is again ineligible because there is no more characters to match, but at least 1 is required.
Note also that it is not just Ruby, the same behaviour is found in all the engines I tested. Python's sub
is a bit inconsistent (possibly because of its adjacency condition? Not sure), but findall
reports the same two matches:
re.findall('.*', 'hello') # => ['hello', '']
JavaScript works just like Ruby:
"hello".replace(/.*/g, "abc") // => "abcabc"
As does Java:
"hello".replaceAll(".*", "abc") // => "abcabc"
And even PHP (using PREG):
preg_replace('/.*/', 'abc', 'hello'); # => "abcabc"