[A-Za-z0-9]* {2}
Quantifiers in regular expressions will always only apply to the element right in front of them. So a \d+
will look for one or more digits but x\d+
will look for a single x
, followed by one or more digits.
If you want a quantifier to apply to more than just a single thing, you need to group it first, e.g. (x\d)+
. This is a capturing group, so it will actually capture that in the result. This is sometimes undesired if you just want to group things to apply a common quantifier. In that case, you can prefix the group with ?:
to make it a non-capturing group: (?:x\d)+
.
So, going back to your regular expression, you would have to do it like this:
([A-Za-z0-9]* ){2}
However, this does not actually have any check that the second matched word is the same as the first one. If you want to match for that, you will need to use backreferences. Backreferences allow you to reference a previously captured group within the expression, looking for it again. In your case, this would look like this:
([A-Za-z0-9]*) \1
The \1
will reference the first capturing group, which is ([A-Za-z0-9]*)
. So the group will match the first word. Then, there is a space, followed by a backreference to the first word again. So this will look for a repetition of the same word separated by a space.
As bobble bubble points out in the comments, there is still a lot one can do to improve the regular expression. While my main concern was to explain the various concepts without focusing too much on your particular example, I guess I still owe you a more robust regular expression for matching two consecutive words within a string that are separated by a space. This would be my take on that:
\b(\w+)\s\1\b
There are a few things that are different to the previous approach: First of all, I’m looking for word boundaries around the whole expression. The \b
matches basically when a word starts or ends. This will prevent the expression from matching within other words, e.g. neither foo fooo
nor foo oo
would be matched.
Then, the regular expression requires at least one character. So empty words won’t be matched. I’m also using \w
here which is a more flexible way of including alphanumerical characters. And finally, instead of looking for an actual space, I accept any kind of whitespace between the words, so this could even match tabs or line breaks. It might make sense to add a quantifier there too, i.e. \s+
to allow multiple whitespace characters.
Of course, whether this works better for you, depends a lot on your actual requirements which we won’t be able to tell just from your one example. But this should give you a few ideas on how to continue at least.