1

I am currently trying to scan a file for a specific pattern and capture pieces of the matched pattern to use in a replacement string.

My current Python 3 script is using this pattern and captures the data in simple cases.

    def readFile(filename):
        pattern = re.compile(b"(<%InsertIf expression=\"\$\{\(\((.*?)\[\'(.*?)\'\].*?\'(.*?)\'\)\)\}\".*?\/InsertIf%>)", re.DOTALL)
        with open(filename, 'r+') as f:
            data = mmap.mmap(f.fileno(), 0)
            for match in re.finditer(pattern, data):
                print(match.groups())
                print ("")

For example, when matching this snippet of the file:

<%InsertIf expression="${((user.MemberAttribute['treatmentcode'] == 'NM'))}" %>some random text goes here<sup>®</sup> membership<%/InsertIf%><%InsertIf expression="${((user.MemberAttribute['treatmentcode'] == 'N1'))}" %>some random text goes here<sup>®</sup> upgrade.<%/InsertIf%><br />

I obtain the desired output from the regex I have in place for these patterns:

(b'<%InsertIf expression="${((user.MemberAttribute[\'treatmentcode\'] == \'NM\'))}" %>some random text goes here<sup>\xc2\xae</sup> membership<%/InsertIf%>', b'user.MemberAttribute', b'treatmentcode', b'NM')

(b'<%InsertIf expression="${((user.MemberAttribute[\'treatmentcode\'] == \'N1\'))}" %>some random text goes here<sup>\xc2\xae</sup> upgrade.<%/InsertIf%>', b'user.MemberAttribute', b'treatmentcode', b'N1') 

However, when the InsertIf expression has additional conditionals, I cannot figure out the appropriate pattern to use for the regex.

Here is a two complex snippets which I am trying resolve. In one case there is an additional '||' conditional. In the other there is an "and" conditional.

<%InsertIf expression="${((user.MemberAttribute['country'] == 'US') || (user.MemberAttribute['country'] == 'CA'))}" %>

In the above case I would expect a second set of captures:

  1. Full InsertIf captured string
  2. user.MemberAttribute
  3. country
  4. US
  5. user.MemberAttribute
  6. country
  7. CA

But since the pattern doesn't account for the conditional the 4th capture returns: 4. US') || (user.MemberAttribute['country'] == 'CA

AND example

<%InsertIf expression="${((user.MemberAttribute['country']=='US') and (user.MemberAttribute['treatmentcode']=='NM'))}" %><%InsertCSE id="XXXXX"%><%/InsertIf%>

Similar expectations and bad result as the '||' example above.

Any assistance with the pattern is greatly appreciated. I am still learning regular expressions and this one is just a tad out of my depth.

Thanks.

Adding additional details as requested: I am essentially trying to perform a conversion of one syntax to another within a file.

Example: I want to find this pattern...

<%InsertIf 
expression="${((user.MemberAttribute['treatmentcode']=='NM'))}" %>
<%InsertCSE id="4000116068"%><%/InsertIf%>
<%InsertElse expression="${((user.MemberAttribute['treatmentcode']=='N1'))}" %>
<%InsertCSE id="4000116069"%>
<%/InsertElse%>

and convert it to this pattern while preserving the variable values:

%%[ if treatmentcode == "NM" then ]%%
%%=contentArea("4000116068")=%%
%%[ elseif treatmentcode == "N1" then ]%%
%%=contentArea("4000116069")=%%
%%[ endif ]%%

The challenge comes into play when there are additional conditionals as part of the expression itself. The original snippets above show more of the details for the input. I can get simple expressions working as desired but it falls apart on the more complex statements.

I was initially trying to take a simple InsertIf case and get it working. I could then loop the file to handle the InsertElse and other cases.

Brendan Abel
  • 35,343
  • 14
  • 88
  • 118
Andrew
  • 11
  • 2
  • This doesn't seem to be possible. The closest I got was to have the last occurrence of the pattern always matched (I only tried the `||` sample). `(<%InsertIf expression="\${\((?:\((.*?\..*?)\['(.*?)'\] == '(.*?)'\)(?: \|\| )?)+\)}"\s?%>.*?<%\/InsertIf%>)`. This may help you understand why, as the same applies to Python as to JavaScript: http://stackoverflow.com/a/3537914/1476989 – Peter Gordon Jun 20 '16 at 19:05
  • You can test out the Regex I made here: http://regexr.com/3dljk – Peter Gordon Jun 20 '16 at 19:06
  • You only get as many group values as there are groups in your regex, if you do not define 7 groups, you can't get more. What if you have 7 `||` conditions? Can you use PyPi `regex` module? There, you can use `.captures` collection. – Wiktor Stribiżew Jun 20 '16 at 20:11
  • Thanks pgmann and wiktor stribizew. I've been testing various patterns using http://pythex.org/ which is a similar tool. I do understand that the number of groups returned is based upon the groups defined in the regex. I guess I was hoping for a magical looping regex that could lookahead for the conditional expressions. It may in fact be impossible but I wanted to seek advice before throwing in the towel. I may be better off capturing the entire group and manipulating the string in a couple of steps. – Andrew Jun 20 '16 at 20:27
  • Could you please share an input string, and the expected output? So that we could answer the question. – Wiktor Stribiżew Jun 20 '16 at 21:09
  • See http://ideone.com/vFBTyg – Wiktor Stribiżew Jun 20 '16 at 21:16
  • Thanks for the assistance. I added more detail to the original post to try and address your request. Your ideone.com post certainly gives me more insight on approaches to take. Thank you. – Andrew Jun 20 '16 at 22:15
  • Your title should be a brief, specific description of your problem. – Nic Jun 21 '16 at 22:11
  • Hi, I've hit that `InsertIf` `expression` syntax, but can't find any documentation about it: can you point me to any? – watery Mar 28 '17 at 15:36

1 Answers1

0

Give you an idear to solve the first problem. You can take two steps:

  1. get the whole expression with pattern:

    re.compile(b"(<%InsertIf expression=\"\$\{\(\((.*?)\)\)\}\".*?\/InsertIf%>)", re.DOTALL)

  2. When you get the context, you can use another pattern to find name, attribute, and value

    ([^\[]*)\[\'([^\']*)\'\].*?\'([^\']*)\'

    here, I'd like use [\']* or [\[]* instead of .*?

using findall() to find all matches.

The second problem which is convert a "language" to another needs getting more infomation like ==

Bob
  • 16
  • 1