-1

I have some XML that I want to replace.

Below, I want to replace "Alarm" within <Alarm name="Alarm"> with the name of the Tag to which it belongs. E.g. for the first Tag below, the Tag name is "Buffer Tank Pig Catcher Equipment Fault" So the Alarm name would need to be set to "Buffer Tank Pig Catcher Equipment Fault" as well. I plan to use notepad++, so I need to use capture groups in order to put it all back together.

I have a regex saved on regexr: regexr.com/58ac5

(<Tag name=)("[\w\d\s]+")( path="Alarms"[\w\d\s=">\r\n</\-;\[{}\]!@#$%^&*+(),\.]+<Alarm name=)("[\w\d\s]+")

But it's not returning the right results. It captures the correct start, but it matches up until the very last Tag's <Alarm name="..." in the XML (if you add more than two of the tag elements you'll see that it gets to the last Tag element before finding the <Alarm name="..." part and ending the capture). It's not capturing each Tag individually.

So my question is, how can I capture the text in between the two text blocks (e.g. <Tag name="Tag Name" path="Alarms" and <Alarm name="Alarm") for each Tag element? (what should this part be replace with? [\w\d\s=">\r\n</\-;\[{}\]!@#$%^&*+(),\.]+)

Thanks in advance!

<Tag name="Buffer Tank Pig Catcher Equipment Fault" path="Alarms" type="OPC">
     <Property name="Value"/>
     <Property name="DataType">6</Property>
     <Property name="OPCServer">Ignition OPC-UA Server</Property>
     <Property name="OPCItemPath">ns=1;s=[{PLCName}]{DeviceName}_Trfer.Alm.11</Property>
     <Property name="EngUnit">%</Property>
     <Property name="PrimaryHistoryProvider">SQLServer</Property>
     <Property name="HistoryMaxAgeMode">5</Property>
     <Property name="HistoryMaxAge">4</Property>
     <Alarms>
        <Alarm name="Alarm">
           <Property name="setpointA">1</Property>
           <Property name="priority">3</Property>
           <Property name="ackMode">1</Property>
           <Property name="label" bindtype="Expression">&apos;{InstanceName} {TagName}&apos;</Property>
           <Property name="displayPath" bindtype="Expression">replace(replace(replace(
 {itemPath}
 
,{System/Site Name} + &apos;/&apos;, &apos;&apos;)
,&apos;/Alarms/&apos;, &apos;/&apos;)
,&apos;/&apos;,&apos; &apos;)</Property>
        </Alarm>
     </Alarms>
  </Tag>
  <Tag name="Low Flow Alarm" path="Alarms" type="OPC">
     <Property name="Value"/>
     <Property name="DataType">6</Property>
     <Property name="OPCServer">Ignition OPC-UA Server</Property>
     <Property name="OPCItemPath">ns=1;s=[{PLCName}]{DeviceName}_Trfer.Alm.15</Property>
     <Property name="EngUnit">%</Property>
     <Property name="PrimaryHistoryProvider">SQLServer</Property>
     <Property name="HistoryMaxAgeMode">5</Property>
     <Property name="HistoryMaxAge">4</Property>
     <Alarms>
        <Alarm name="Alarm">
           <Property name="setpointA">1</Property>
           <Property name="ackMode">1</Property>
           <Property name="label" bindtype="Expression">&apos;{InstanceName} {TagName}&apos;</Property>
           <Property name="displayPath" bindtype="Expression">replace(replace(replace(
 {itemPath}
 
,{System/Site Name} + &apos;/&apos;, &apos;&apos;)
,&apos;/Alarms/&apos;, &apos;/&apos;)
,&apos;/&apos;,&apos; &apos;)</Property>
        </Alarm>
     </Alarms>

Update: I found a situation where this doesn't quite work, if there are <Tags> that don't have an <Alarm> element, it keeps returning until it finds a <Alarm> element within another <Tag>. So it could have multiple finds of <Tag> elements which is incorrect. Basically I don't want it to return the purpley-blue highlighted section below: https://regexr.com/58pi5 I need to discard the find if it encounters a closing </Tag> element before it finds the <Alarm..> element. I've tried negative lookbehind/ahead, and can't get it working.. enter image description here

njminchin
  • 408
  • 3
  • 13

2 Answers2

1

@Tim Biegeleisen is correct, you should not be using regex for this.

In Python it is only a few lines of code to use the built-in XML parser to get what you need. Note, however, that your file must be valid XML, meaning it must have a root element and all tags must be closed. Your sample is missing a </Tag> at the end and does not have a root element, so I've added those.

import xml.etree.ElementTree as ET

xmlString = """<Document>
 <Tag name="Buffer Tank Pig Catcher Equipment Fault" path="Alarms" type="OPC">
 <Property name="Value"/>
 <Property name="DataType">6</Property>
 <Property name="OPCServer">Ignition OPC-UA Server</Property>
 <Property name="OPCItemPath">ns=1;s=[{PLCName}]{DeviceName}_Trfer.Alm.11</Property>
 <Property name="EngUnit">%</Property>
 <Property name="PrimaryHistoryProvider">SQLServer</Property>
 ... # etc
 </Document>""" # added

root = ET.fromstring(xmlString)
# or if opening from file:
# tree = ET.parse('your_file_name.xml')
# root = tree.getroot()

for tag in root.findall('Tag'):
  tagName = tag.get('name')
  for alarm in tag.iter('Alarm'):
    alarm.set('name', tagName)

newTree = ET.tostring(root)
print(newTree.decode())

with open('output.xml', 'wb') as outputFile:
  outputFile.write(newTree)

# or use tree write method if opened file to begin with
# tree.write('output.xml', encoding="unicode")

Demo

jdaz
  • 5,964
  • 2
  • 22
  • 34
  • 1
    Awesome! Thanks for the code and your time :) Also happy it's in Python as that's what I've been these days – njminchin Jul 13 '20 at 06:15
0

Your middle term (with the giant character class) has a quantifier of +, which is greedy and so will match/consume everything possible (including any intermediate terminating blocks) while still matching.

Change the quantifier to +?, which is reluctant and so will match/consume a little as possible while still matching, so preventing it from skipping over potential terminating text blocks.

See https://regexr.com/58aev

Bohemian
  • 412,405
  • 93
  • 575
  • 722