Why is there no "match all" special character in regex?

Question

I love regexp but I find it rather confusing why there is no "match all" special character? For example, if I wanted to select a HTML tag and its contents, I would do

re = "<tag>([\s\S]*)</tag>"

You see, [\s\S] is a workaround to solve the absence of a match-all special character. Is there a reason why a match-all is missing from the spec? I know about . but it's not that pretty either: re = "<tag>([.\n]*)</tag>"

`[^]` would be neat but it caused an error in my python code. Something like "no closing brackets detected" if I remember correctly — Karveiani, Jul 14 '20 at 02:00
We can't answer "why" questions like this. They are how they are. — Barmar, Jul 14 '20 at 02:00
@theX "Is there a reason why a match-all is missing from the spec?" — Karveiani, Jul 14 '20 at 02:00
@Barmar I guess you could say "it is what it is". I was just assuming that there would be a historical reason. Or that people use regex in a different style (compared to mine), so, that a match-all would be considered an anti-pattern. — Karveiani, Jul 14 '20 at 02:02
@Karveiani it’s probably because then, you’d have to type in `[^\S\s]` instead of `.` — theX, Jul 14 '20 at 02:03
@theX `]` appearing right after `[` or `[^` includes `]` into the character class instead of ending it (in PCRE, at least); e.g. `\[[^][]*\]` to match something in brackets. (Also: this breaks regexr.com. Funsies!) — HTNW, Jul 14 '20 at 02:04
`[.\n]` would be a `.` or new line, for new line or any character it would be `(?:.|\n)` — user3783243, Jul 14 '20 at 02:11
Find the team that developed the particular flavor of regex you're using and ask them. We can't speculate on why the did or did not provide a specific feature. The better question is why you're attempting to use a regex to parse HTML or XML when you can use a DOM parser instead. Obligatory link about the [futility of trying to parse X/HtML with a regex](https://stackoverflow.com/a/1732454/62576). — Ken White, Jul 14 '20 at 02:12
@KenWhite Before posting this question I was wondering if match-all behaviour is considered bad practice because it's not implemented in regex. BarMar responded that it would come with performance issues. So I really don't understand why you downvoted this question — Karveiani, Jul 14 '20 at 02:24
Also I'm not going to parse XML with regex so don't worry. It was an example — Karveiani, Jul 14 '20 at 02:26
Who said I downvoted? I posted a comment. You should be careful about making accusations without proof. — Ken White, Jul 14 '20 at 02:49
@KenWhite Someone downvoted my question and you were the only one coming after me in the comments. — Karveiani, Jul 14 '20 at 02:55
I made basically the same comment as Barmar did, six comments above mine. Again, you should be careful about making accusations without any evidence. I didn't *come after you*. If you feel like I did, you should develop a less sensitive personality when participating here. Not every comment is an *attack*. — Ken White, Jul 14 '20 at 03:02
The answer is to use `.` and follow https://stackoverflow.com/a/45981809/3832970 post. Sometimes, `.` matches just any char. — Wiktor Stribiżew, Jul 14 '20 at 07:43

Barmar · Accepted Answer · 2020-07-14T02:06:31.947

1

. is the match all character. By default it doesn't match newlines, but if you set the DOTALL flag it will match all characters. In Python you write:

re.search(r"<tag>(.*)</tag>", string, re.DOTALL)

Why isn't this the default? Probably because most regexp applications want to limit matches to within a line (especially for performance reasons). And having two separate characters, one for "match all" and another for "match all except newline", would have been a waste of characters.

edited Jul 14 '20 at 02:06

answered Jul 14 '20 at 02:02

Barmar

741,623
53
500
612

1

`.` **is** matching **any chars including newlines** in POSIX based regex flavors. I think [this answer](https://stackoverflow.com/a/45981809/3832970) is enough, and there is no need to re-post the same. – Wiktor Stribiżew Jul 14 '20 at 07:47

Why is there no "match all" special character in regex?

1 Answers1