1

I have a file which uses a xml schema. That looks like this:

    <maplayer simplifyAlgorithm="0" minimumScale="0" maximumScale="2500" simplifyDrawingHints="0" readOnly="0" minLabelScale="0" maxLabelScale="1e+08" simplifyDrawingTol="1" geometry="Point" simplifyMaxScale="1" type="vector" hasScaleBasedVisibilityFlag="1" simplifyLocal="1" scaleBasedLabelVisibilityFlag="0">
      <id></id>
      <datasource>port=1521 user=test_user password=test_passwd</datasource>
      <keywordList>
        <value></value>
      </keywordList>
      <featformsuppress>0</featformsuppress>
      <editorlayout>generatedlayout</editorlayout>
      <widgets/>
      <conditionalstyles>
        <rowstyles/>
        <fieldstyles/>
      </conditionalstyles>
    </maplayer>
  </projectlayers>
  <properties>
    <Variables>
      <variableNames type="QStringList">
        <value>paswd</value>
        <value>user</value>
      </variableNames>
      <variableValues type="QStringList">
        <value>5zdgf</value>
        <value>dgdgdgfdg</value>
      </variableValues>
      </Variables>
    <customproperties>
    <property key="labeling/textColorR" value="0"/>
    <property key="labeling/textTransp" value="0"/>
    <property key="labeling/upsidedownLabels" value="0"/>
    <property key="labeling/useSubstitutions" value="false"/>
    <property key="labeling/wrapChar" value=""/>
    <property key="labeling/xOffset" value="0"/>
    <property key="labeling/yOffset" value="0"/>
    <property key="labeling/zIndex" value="0"/>
    <property key="variableNames"/>
    <property key="variableValues"/>
  </customproperties>

So I wanted to use python to delte the password and user part as well as the variables parts. I use the following code:

import re

with open('C:\myfile.txt') as oldfile, open('C:\myfile_withoutPW.txt', 'w') as newfile:
    oldText = oldfile.read()
    noPass = re.sub(r'(password=).*?(?=\s) ', '', oldText.rstrip())
    noPass_noUser = re.sub(r'(user=).*?(?=\s) ', '', noPass.rstrip())
    # fehlt noch
    newText = re.sub(re.escape(r'<property key="variableNames"/>'), '', noPass_noUser.rstrip())
    newText = re.sub(re.escape(r'<property key="variableValues"/>'), '', newText.rstrip())
    newfile.write(newText)

This works, but not completly as I wanted it to, it deltes the parts but it leaves empty lines, like:

 <property key="labeling/wrapChar" value=""/>
        <property key="labeling/xOffset" value="0"/>
        <property key="labeling/yOffset" value="0"/>
        <property key="labeling/zIndex" value="0"/>


      </customproperties>
      <blendMode>0</blendMode>
      <featureBlendMo

How can i solve this to completly delte those lines/parts form my txt file?

Max Mustermann
  • 181
  • 2
  • 12

2 Answers2

2

Processing xml with regex is risky. Suppose a property element is on more than one line. An alternative is to use Extensible Stylesheet Transforms (XSLT). I don't know all of your requirements so tried to match your example:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <!-- pretty print output -->  
  <xsl:strip-space elements="*" />
  <xsl:output method="xml" indent="yes"/>

  <!-- strip unwanted elements and attributes -->  
  <xsl:template match="datasource|Variables|@user|@password"/>

  <!-- pass everything else through -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <!-- start tranform at the root -->
  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

</xsl:stylesheet>
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Its not good practice but for my purpose its good to have more matches because there are many instance i want to delete form the file. – Max Mustermann Apr 09 '18 at 06:26
  • Potentially... the match gets all instances of its target and if you perform the same action for each thing you match then the only advantage to multiple match lines is readability. – tdelaney Apr 09 '18 at 15:32
0

This seems workable for the output

(?mi)((?:password=|user=)[^\n]*$|\<property key=\"variableNames\"\/\>\n|\<property key=\"variableValues\"\/\>\n)

Demo,,, in which newline \n is added to some parts of the regex so that avoid creating empty line.

In python, it may be like this

ss=""" copy&paste your string in this area """

regx= re.compile(r'(?mi)((?:password=|user=)[^\n]*$|\<property key=\"variableNames\"\/\>\n|\<property key=\"variableValues\"\/\>\n)')
print(regx.sub('',ss))

And if you want remove empty lines created from deleting matched strings, then you can try this regex for matching empty lines in your text.

(?m)^\s*$\n

Thus, it is applicable to your script by inseting this line.

newText = re.sub(r'(?m)^\s*$\n','',newText)
Thm Lee
  • 1,236
  • 1
  • 9
  • 12