Not 100% sure what I am doing wrong. Unfortunately I need to parse XML with regex and not Beautiful soup or other. This is supposed to replace Match with comment
CODE:
import re, shutil
TAG_NAME = 'ruleDefinition'
CASE_LABEL = 'case'
word_file = 'MetaData.txt'
xml_file = 'ProcessAll.xml'
extension = '.bak'
backup = xml_file + extension
# Create backup
shutil.copy2(xml_file, backup)
with open(word_file) as words:
regex = r'<[^>]+ field=$"({})"[^>]+>'.format(
'|'.join(
sorted((word.rstrip('\r\n') for word in words), key=len, reverse=True)
)
)
with open(xml_file, 'w') as new_xml:
with open(backup) as xml:
names = []
start = False
entry = ''
for line in xml:
# start tag
if re.findall(r'<{}[^>]*>'.format(TAG_NAME), line):
start = True
# end tag
if '</{}'.format(TAG_NAME) in line:
start = False
if names:
new_xml.write('<!-- Removed ' + ','.join(names) + ' -->\n')
names = []
# inside tag
if start:
if len(entry):
entry += line
if '<{}'.format(CASE_LABEL) in line:
entry += line
if '</{}'.format(CASE_LABEL) in line:
match = re.search(regex, entry)
if match:
name = match.group(1)
names.append(name)
else:
new_xml.write(entry)
entry = ''
continue
if len(entry):
continue
new_xml.write(line)
File called in script:
cat MetaData.txt
NORMALIZED_PRICE_REALTIME
STAMP_DUTY_FLAG_REALTIME
XML FILE:
<ruleDefinition name="ProcessAllFields" category="subrule" defaultContext="Security">
<case label="0xFBDE">
<!-- dec=64478, NORMALIZED_PRICE_REALTIME -->
<if>
<or>
<equal op1="$temp.updateAlways" op2="true"/>
<equal op1="#NORMALIZED_PRICE_REALTIME" op2="0"/>
</or>
<then>
<multiply op1="$inField.data" op2="$temp.pScale"
store="$NORMALIZED_PRICE_REALTIME" round="-3"/>
<appendField field="$NORMALIZED_PRICE_REALTIME"/>
</then>
</if>
</case>
<case label="0xFBDF">
<!-- dec=64479, STAMP_DUTY_FLAG_REALTIME -->
<if>
<or>
<equal op1="$temp.updateAlways" op2="true"/>
<equal op1="#STAMP_DUTY_FLAG_REALTIME" op2="0"/>
</or>
<then>
<assign to="$STAMP_DUTY_FLAG_REALTIME" from="$inField.data"/>
<appendField field="$STAMP_DUTY_FLAG_REALTIME"/>
</then>
</if>
</case>
</ruleDefinition>
Basically it is deleted all data with the string 'case'
DESIRED RESULT
<--Removed NORMALIZED_PRICE_REALTIME, STAMP_DUTY_FLAG_REALTIME --/>