0

I'm trying to match some variable names in a html document to populate a dictionary. I have the html

<div class="no_float">
    <b>{node_A_test00:02d}</b>{{css}}
    <br />
    Block mask: {block_mask_lower_node_A} to {block_mask_upper_node_A}
    <br />
</div>
<div class="sw_sel_container">
    Switch selections: 
    <table class="sw_sel">
        <tr>
            <td class="{sw_sel_node_A_03}">1</td>
            <td class="{sw_sel_node_A_03}">2</td>
            <td class="{sw_sel_node_A_03}">3</td>
            <td class="{sw_sel_node_A_04}">4</td>
            <td class="{sw_sel_node_A_05}">5</td>

I want to match code between { and ( } or : ). But if it starts with {{ I don't want to match it at all (I will be using this for inline css}

so far I have the regex expression

(?<=\{)((?!{).*?)(?=\}|:)

but this is still matching text inside {{css}}.

Floris
  • 45,857
  • 6
  • 70
  • 122
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – wim May 02 '13 at 00:16
  • 3
    Well... this isn't exactly parsing HTML since the OP isn't trying to do anything with tags. – Andrew Clark May 02 '13 at 00:19
  • @F.J True. But it's kind of obligatory to post that link everywhere HTML and Regex are mentioned in the same question, isn't it? – Kyle Strand May 02 '13 at 00:49

3 Answers3

1

You could do something like this:

re.findall(r'''
    (?<!\{)    # No opening bracket before
    \{         # Opening bracket
      ([^}]+)  # Stuff inside brackets
    \}         # Closing bracket
    (?!\})     # No closing bracket after
''', '{foo} {{bar}} {foo}', flags=re.VERBOSE)
Blender
  • 289,723
  • 53
  • 439
  • 496
  • Thanks Blender, thats works. I also found that because i'm using curly braces I can use string.Formatter() and parse out the field_names – user2341223 May 02 '13 at 00:35
  • Wouldn't this match {{bar}, though? Granted, that's only a problem if brackets are unmatched. – Kyle Strand May 02 '13 at 00:51
0

This seems to be working:

(?<=(?<!{){)[^{}:]+

and this with a capture:

(?<!{){([^{}:]+)
perreal
  • 94,503
  • 21
  • 155
  • 181
0

I see that you've already found a solution that works, but I thought it might be worthwhile to explain what the problem with your original regex is.

  • (?<=\{) means that a { must precede whatever matches next. Fair enough.
  • ((?!{).*?) will match anything that starts with a character other than {. Okay, so we're only matching things inside the braces. Good.

But now consider what happens when you have two opening braces: {{bar}}. Consider the substring bar. What precedes the b? A {. Does bar start with {? Nope. So the regex will consider this a match.

You have, of course, prevented the regex from matching {bar}, which is what it would do if you left the (?!{) out of your pattern, because {bar} starts with a {. But as soon as the regex engine determines that no valid match starts on the { character, it moves on to the next character--b--and sees that a match starts there.

Now, just for kicks, here's the regex I'd use:

(?!<={){([^{}:]+)[}:](?!=})

  • (?!<{) : the match shouldn't be preceded by {.
  • { : the match starts with an open brace.
  • ([^{}:]+) : group everything that isn't an open-brace, close-brace, or colon. This is the part of the match that we actually want.
  • [}:] : end the match with a close-brace or colon.
  • (?!}) : the match shouldn't be followed by }.
Kyle Strand
  • 15,941
  • 8
  • 72
  • 167