0

I have a html element like below

<a href="/Test/URL/Page/" title="" >Test</a>

I am trying to extract the value 'Page' from the href. I have tried /href="\/(.*)\/" which gives out 'Test/URL/Page' but couldn't figure out how to proceed further.

Tried /href="\/([^\/]*$)\/" but this doesnt work.

Without going into details, I do not want to use Html parser or C# code.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
Tony Stark
  • 781
  • 6
  • 22
  • 1
    Is the string you want always going to be the third item? Or are you going after the last item? Any other details you can provide on why you are extracting 'Page'? – mittmemo Feb 04 '15 at 17:51
  • 1
    exactly how do you think this regex would work if you don't have to use a programming language? you'd still need C# to RUN whatever regex you end up with... – Marc B Feb 04 '15 at 17:53
  • 2
    I understand that you don't want to use an HTML parser, but any other solution is asking for sorrow down the road. http://htmlparsing.com/regexes.html gives some examples of valid HTML that will stymie your regexes. – Andy Lester Feb 04 '15 at 17:55

1 Answers1

0

Change your regex like below.

href="\/[^"]*\/(.*?)\/"

DEMO

[^"]* negated character class which matches any character but not of double quotes zero or more times.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Thanks, it gives the result that I need but so is `href="\/[^"]*\/(.*)\/"`. Please can you explain why there is a '?' in your answer? – Tony Stark Feb 04 '15 at 19:11
  • `.*?` will do a [non-greedy match](http://stackoverflow.com/questions/11898998/regex-match-non-greedy). – Avinash Raj Feb 05 '15 at 01:38