Matchs Only Text

Question

var regex = "<a\b[^>]*>(.*?<span\b[^>]*>(.*?)<\/span>)<\/a>";


<a href="/computers">Computers<span>(1896)</span></a>

How to get only "Computers" text ?

Don't use Regex for Html, use [HtmlAgilityPack](https://html-agility-pack.net/) or [AngleSharp](https://github.com/AngleSharp/AngleSharp) — maccettura, Dec 05 '18 at 20:55
Though I agree with other suggestions, but using parser to parse entire DOM to get only substring is just overdose. — eocron, Dec 05 '18 at 20:57
Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — maccettura, Dec 05 '18 at 21:17

score -2 · Answer 1 · edited Dec 06 '18 at 14:52

-2

You have a ")" in the wrong place:

<a\b[^>]*>(.*?)<span\b[^>]*>(.*?)<\/span><\/a>

FYI: Google and find a favorite REGEXP tester. There are some available in Visual Studio extensions and additionally some on the internet plus some in other tools like Komodo, etc.

I tossed your string and expression into https://regex101.com/ and had it working in less than a minute.

edited Dec 06 '18 at 14:52

maccettura

10,514
3
28
35

answered Dec 05 '18 at 21:06

Frank Merrow

949
1
8
19

2

But, again, Regex is the wrong tool for parsing HTML - it's not a _regular_ language. – Flydog57 Dec 05 '18 at 21:07
I just answered the question . . ." wrong tool" is a separate argument. Winning or losing that argument has to do with context which we don't have here. Generally, you are likely correct, but in some contexts, this might be the right approach. We don't have enough information to know. – Frank Merrow Dec 05 '18 at 21:10
1

@FrankMerrow I suggest you read this [popular answer](https://stackoverflow.com/a/1732454/4416750). There isn't a context where attempting to parse HTML with RegEx is a good idea. – Lews Therin Dec 05 '18 at 21:22
@FrankMerrow, I was referencing the previous comments to the question. It's not just a wrong tool like using a hammer to drive a screw in, it's more like using an ohm-meter. If you can get it to work, the solution is extremely fragile - and fragile in a way that you need Fix-A to fix it the first time it breaks, but a completely different Fix-B the next time. I didn't vote you down, by the way; you answered the question – Flydog57 Dec 05 '18 at 21:34
As the "popular answer" points out, if you are trying to parse an HTML document, then of course a DOM or other parser is the better way to go. Using RegExp in that context would be very wrong-headed. However, the question as posed is: how do I find "Computer" in this string? If instead of a whole HTML document you have ONLY the string offered then the solution is less obvious. You could fire up a DOM parser to parse 40 characters of HTML . . . I wouldn't. It's a choice. If the poster had said this string was part of a larger HTML document, then my answer would completely agree with yours. – Frank Merrow Dec 06 '18 at 04:53

Matchs Only Text

1 Answers1