2

I am confused about the behaviour of the .* regular expression in JavaScript if the global flag is used:

var str = "Hello World!",
    reg = new RegExp(".*","g");
var matched = str.match(reg); 
// matched == ["Hello World!", ""]

I don't understand why does it add an empty string to the matched array. It doesn't happen when the regular expression has no global g flag.

What is the logics behind it?

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
BartoNaz
  • 2,743
  • 2
  • 26
  • 42
  • This question pops up now and then. The `.*` matches an empty string at the end of the string when the `g` flag is specified. – Wiktor Stribiżew Jul 12 '16 at 06:18
  • Please look at [the following answer](http://stackoverflow.com/a/1520853/6551577). It could be helpful for you. – Alex M Jul 12 '16 at 06:23

3 Answers3

4

Every string has empty strings before and after each and every character.

Now, you are doing greedy matching with .*, which actually means "zero or more characters and match as much as possible". Here, Hello World! is matched by .* and then there is global modifier. So, it tries to match again and matches the empty string at the end (which matches because .* means zero or more characters). That is why you are getting it in the result.

You can confirm the same, with +, like this

var str = "Hello World!";
var reg = new RegExp(".+", "g");
console.log(str.match(reg));
// [ 'Hello World!' ]

Here, + means, one or more times. Since the .+ matches Hello World!, the global modifier searches again but found no more characters to match.


Want to see something interesting? Try this

var str = "Hello World!";
var reg = new RegExp(".*?", "g");
console.log(str.match(reg));
// [ '', '', '', '', '', '', '', '', '', '', '', '', '' ]

Why is that? .*? means match zero or more characters but match only as minimum as possible (non-greedy). So, it starts from the first character, finds an empty string closes the search. Global modifier makes the match again, finds another empty string after H, and so on, till the end of the string is reached.

But if you used +, like this

var str = "Hello World!";
var reg = new RegExp(".+?", "g");
console.log(str.match(reg));
// [ 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '!' ]

It has to match one or more characters but match as minimum as possible. So, it matches one characters, stops. Global modifier matches again, matches the second character and so on.

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
1

The dot star matches any character (except the newline) greedily (zero or more times):

  1. The first step matches the whole string (Hello World!)
  2. The second attempt looks at the end of the last match and tries to match the same pattern again. The dot matches any character, the star says zero or more times, so it does match.

Maybe an image makes clearer what the dot-star actually does:

regex101.com

Additionally a link with capture groups on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
0

"*" mean zero or more,you should use "+", try this:

var str = "Hello World!",
reg = new RegExp(".+","g");
var matched = str.match(reg); 
Bowen Li
  • 387
  • 1
  • 8