1

I'm having troubles understanding how the regex in C work. Basically I have an XML file (I can't use an XML parser) containing lines like this:

<Node Bla="blabla" Name="this is my name" .... />
<Node Name="this is my name" Bla="blabla" .... />

What I would like to do is extract the name part of each line. So far I have been using the following regex:

char *regex_str = "Name=\"([^\"]*)\"";

But this gives me Name="this is my name", I'm only looking for the this is my name part.

What am I doing wrong?

Thomas Eschemann
  • 995
  • 7
  • 18
  • The flippant "I can't use an XML parser" makes [this link](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) almost a necessity, I think. :) – unwind Jul 09 '14 at 09:43
  • I'm completely aware that this is no way the best way to proceed, and believe me if I could use an XML parsing library I would. However I am relatively confident I can "parse" the file safely since: - The XML structure is flat - The XML file is dynamically generated and isn't to be edited by users – Thomas Eschemann Jul 09 '14 at 09:53

2 Answers2

1

Just use a lookbehind to capture the characters which are just after to the string Name upto the first " symbol,

(?<=Name=\")([^\"]*)

Explanation:

  • (?<=Name=\") Sets the matching marker just after to the string Name"
  • ([^\"]*) Captures all the characters not of " zero or more times.
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1

You may not need a capturing group.

Assuming your library has lookbehind (which it definitely does if it's PCRE), you can use this regex to match the name:

(?<=[Nn]ame=")[^"]+

See regex demo.

Explanation

  • the lookbehind (?<=[Nn]ame=") asserts that what precedes is Name=" or name="
  • [^"]+ matches one or more chars that are not a "

Reference

zx81
  • 41,100
  • 9
  • 89
  • 105