4

I have the following data:

int  time="1356280261"
char value="3000"

bankLine {
  char value="3000"
  char currency="EUR"
  int  time="1356280261"
} #bankLine

I am parsing this data recursively and only want to match the 2 variables outside the block separately.

I do have this regex to match the variable

/(?:char|int)\s*([A-z0-9]*)\s*=\s*"(.*)"/

Yet, the regex matches all occurrences inside the block, too.

How can I match only the first 2 variables individually and ignore all inside the bankLink-block?

Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
jones
  • 668
  • 7
  • 23

3 Answers3

4

It's a bit hackish, but you can try adding a negative lookahead, like this:

/(?:char|int)\s*([A-z0-9]*)\s*=\s*"(.*)"(?![^{]*\})/
                                        ^^^^^^^^^^^

This assumes that all braces are balanced, and fortunately nestedness shouldn't matter (whereas normally it would, in similar questions) since you're looking for the case outside brackets.

The lookahead is based on this observation: If you encounter a close-brace without encountering an open-brace, then we might reasonably assume that we're within braces.

One is tempted to extend this the other way to include a negative lookbehind, but unfortunately most implementations do not support variable-length lookbehinds.

EDIT:

As discussed in the comments below, these fixes are recommended:

/(?:char|int)\s*([A-Za-z0-9]*)\s*=\s*"([^"]*)"(?![^{]*\})/
                    ^^^                ^^^^^
Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
  • +1, but I changed `A-z` to `A-Za-z` because there are some non-letter characters between ASCII `Z` and ASCII `a` that you don't want to match. – Tim Pietzcker Dec 25 '12 at 16:02
  • @TimPietzcker - Thanks, Tim. I just pasted the regex from his question to show him the difference, but it's a good point. I'll edit his question too. – Andrew Cheong Dec 25 '12 at 16:03
  • Good idea. Also, `"([^"]*)"` would probably be better than `"(.*)"`, but since it seems that there's at most one key/value pair per line, and that dotall mode isn't set, it's not a big issue either way. – Tim Pietzcker Dec 25 '12 at 16:09
  • @TimPietzcker - Actually, that's a good point. In the off-chance `dotall` _is_ set, I think the greediness of `.*` can produce unwanted results, _i.e._ if we're within braces, then matching everything _past_ the close-brace until we're _outside_, _then_ applying (and matching) the negative lookahead. I'll make the edit. I see other improvements can be made too, but they seem minor compared to this. – Andrew Cheong Dec 25 '12 at 16:13
  • @TimPietzcker - In case you check back and get confused, I reverted the `A-Za-z` change to show the difference from the original regex, then mentioned the additional fixes in a separate edit. – Andrew Cheong Dec 25 '12 at 16:17
  • thx @acheong87 and TimPietzcker (can only mention one person...) :) – jones Dec 26 '12 at 12:02
0

See if something like this works for you:

^(?:char|int)[^\n\r]*?$

Or just put a ^ in front of your expression

0

This might not be the best solution but i think this will work for your case:

/^(int|char).*$/

The reason is that your declarations are indented inside the bankLine block. Thats what we're taking advantage of here. We are simply matching all lines starting with int or char which do not have any spaces in the beginning.

abhi.gupta200297
  • 881
  • 6
  • 12