2

I have a text file full of names, I want to match them all via Regex.

Each name ends with the following text: fsa fwb fcc, eg:

">Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc

I want to use the following expression to match the names:

""">.+?""fsa fwb fcc"

AKA match all text from "> up to fsa fwb fcc, I can then parse the excess matched myself.

However as "> occurs throughout the file, it starts matching from much earlier. I have always wondered how to match from the LAST occurance of something, in this case, ">, up to the end specified.

Cœur
  • 37,241
  • 25
  • 195
  • 267
John Cliven
  • 973
  • 1
  • 8
  • 21
  • In your particular case, [`RegexOptions.RightToLeft`](http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx) should do it. – Martin Ender Aug 15 '13 at 21:01
  • 1
    Don't parse HTML with regular expressions. – Mulan Aug 15 '13 at 21:01
  • And what naomik said. [This](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags?rq=1) is at the top of the related questions. ;) – Martin Ender Aug 15 '13 at 21:02
  • This isn't parsing, rather this is pattern matching. Given the requirements I doubt this can be accomplished as easily with an HTML parsing engine as it can be via pattern matching. Also I'm not sure \u0012 is a valid html character. – Ro Yo Mi Aug 16 '13 at 02:13
  • Thanks m.buettner, Regex.Options.RightToLeft works perfectly! Exactly what I was looking for. – John Cliven Aug 16 '13 at 11:54
  • @neomik, Denomales is correct, this is not a HTML file and the content is static, predictable, and does not vary, so REGEX seems fine for matching. – John Cliven Aug 16 '13 at 11:55

2 Answers2

1

You can try this:-

.+((fsa|fwb|fcc).+)$

+ matches many characters in front.

((fsa|fwb|fcc) matches and captures the keywords.

.+) matches and captures characters.

$ matches the end of the line.

EDIT:- As suggested by m.buettner RegexOptions.RightToLeft should work for your case.

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
  • [Please add some explanation](http://meta.stackexchange.com/questions/177757/are-answers-that-just-contain-a-regular-expression-pattern-really-good-answers) on how this regex works. – HamZa Aug 15 '13 at 21:19
  • 1
    @HamZa:- Updated with explanation. Do let me know if that doesnt work!! :) – Rahul Tripathi Aug 15 '13 at 21:36
  • Thanks for the explanation, unfortunately it doesn't work for me, but instead matches the entire file! – John Cliven Aug 16 '13 at 11:53
  • @stanleyhiggins:- Got your point. Updated my answer as well. So that it can be used for future reference also. :) – Rahul Tripathi Aug 16 '13 at 11:57
0

Description

It looks like you're ending string is literally fsa fwb fcc, and the beginning of the substring you're interested in starts directly after the last "> before the end string.

This expression will:

  • find the substring between the last "> and the next fsa fwb fcc

">((?:(?!">).)*)fsa\sfwb\sfcc

enter image description here

Live Demo

Sample Text

">sometext">A Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
">sometext">B Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
">sometext">C Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc

Matches Found:

[0][0] = ">A Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[0][1] = A Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"

[1][0] = ">B Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[1][1] = B Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"

[2][0] = ">C Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[2][1] = C Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"

Or

If you want to go further and only capture from the last "> through to the \u0012 before the fsa fwb fcc ... i.e. the actual name and not the markup text, then have a look at this expression

">((?:(?!">).)*?)\\u0012(?:(?!">).)*fsa\sfwb\sfcc

enter image description here

Live Demo

Sample Text

">sometext">A Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
">sometext">B Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
">sometext">C Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc

Matches Found

[0][0] = ">A Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[0][1] = A Dave Smith

[1][0] = ">B Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[1][1] = B Dave Smith

[2][0] = ">C Dave Smith\u0012\/a>\u0012\/div>\u0012div class=\"fsa fwb fcc
[2][1] = C Dave Smith
animuson
  • 53,861
  • 28
  • 137
  • 147
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
  • 1
    This is a really great explanation that is so thorough and works perfectly! I really appreciate that Denomales! – John Cliven Aug 16 '13 at 11:59