3

I am using many different Regex implementations as this happens on several systems (Linux, Windows, VS, notepad++, etc.); it's just where I have a customer who wants to remove auto sizing. The intent is to use regex in which ever tool to find any line that has a width but doesn't have an autowidth and then add autowidth. I am just asking about how to find it, but I intend to then use what I find here in a replacement string for my given editor. I have the replacement bit down: I just haven't figured out how to get one with out the other when the other is far away from the one.

Using https://regex101.com/ I've tried literally dozens of search strings.

This was my starting point for a search string and a couple of attempts to get the lookarounds to exclude AutoWidth anywhere on the line. String 2 and 3 are basically the same thing, but I didn't know what else to try. I assume that anything that works for the lookbehind will work for the lookahead, but as you can see, I can't even the behind working.

(?<! AutoWidth="false") width="\d*"(?! AutoWidth="false")
(?<! AutoWidth="false").*? width="\d*"(?! AutoWidth="false")
(?<! AutoWidth="false")[0-9a-zA-Z" =]*? width="\d*"(?! AutoWidth="false")

I'm stuck, putting distance between AutoWidth and width is eluding me.

These are my targets

1->  <column name="Selected" AutoWidth="false" IsEditable="true" datatype="bool" width="20"/>
2->  <column width="40" AutoWidth="false" name="ExternalIdOrEmpty" index="XIDname" sort="true"/>
3->  <column width="40" name="Tax Rate" index="TRname" sort="true" AutoWidth="false"/>
4->  <column width="40" name="Total Tax" index="TTname" sort="true"/>
5->  <column name="Tax Deductible" index="TDname" sort="true"/>

I want to find all lines that contain

width="\d*"

but that do not contain

AutoWidth="\d*"

anywhere on the same line.

This means that only line 4, in the samples above, would match my criteria.

UPDATE:

I am willing to use any other tool that will get the job done. So XSLT, etc. are all good. The only requirement there is that the tool is generally available on Windows, Linux, & Mac, AND is either open source or free and is also well known.

The full xml is huge. The edit function here is limited to 30,00 characters but here's a good sample.

<?xml version="1.0" encoding="utf-8" ?>
<spread>
  <ViewPatientOutboundReferralFilter>
    <FindColumn name="ViewUid" index="guid" visible="false" />
    <FindColumn name="Selected" caption=" " visible="true" IsEditable="true" datatype="bool"/>
    <FindColumn name="PatientName" caption="Patient Name" visible="true" width="150" hyperlink="true" AutoWidth="false"/>
    <FindColumn name="ReferToProviderName" caption="Provider" visible="true" AutoWidth="false" width="150" hyperlink="true"/>
    <FindColumn name="ReferredToMedicalServicesProviderName" caption="Medical Services Provider" visible="true" width="150" hyperlink="true"/>
    <FindColumn name="ProviderRole" caption="Provider Role" visible="true" width="80" hyperlink="true"/>
    <FindColumn name="StatusName" caption="Current Status" visible="true" width="100" hyperlink="true"/>
    <FindColumn name="ServiceSiteName" caption="Service Site" visible="true"/>
    <FindColumn name="VisitDate" caption="Visit Date" visible="true" width="90" datatype="date"/>
    <FindColumn name="AppointmentDate" caption="Appointment Date" visible="true" datatype="datetime" width="90"/>
    <FindColumn name="Notes" caption="Comments" visible="true" width="120"/>
    <FindColumn name="AppointmentNotes" caption="Referral Notes" visible="true" width="120"/>
    <FindColumn name="DisplayName" visible="false" index="name"  />
    <FindColumn name="ProviderUid" visible="false" storeproperty="true" />
    <FindColumn name="VisitUid" visible="false" storeproperty="true" />
    <FindColumn name="CreatedDate" caption="Created Date" visible="true" datatype="date" width="90"/>
    <FindColumn name="RequestingName" caption="Requesting Provider" visible="true" width="150" />
  </ViewPatientOutboundReferralFilter>
  <FeeScheduleFeeAA rowcount="3">
    <column row="0" rowspan="3" caption="Code" width="50" name="Procedure.Code" sort="true" index="name" />
    <column row="0" rowspan="3" caption="Description" relwidth="100%" width="80" AutoWidth="false" name="Procedure.ShortDescription" sort="true" />
    <column row="0" rowspan="3" caption="Amount Allowed" width="60" AutoWidth="false" name="Fee" IsEditable="true" datatype="currency" />
    <column row="0" rowspan="3" caption="Global Period" width="40" AutoWidth="false" name="GlobalPeriodDays" IsEditable="true" datatype="number" decimalPlaces="0" minValue="0" maxValue="1000" />
    <column row="0" colspan="5" caption="Coinsurance" />
    <column row="1" colspan="2" caption="Insurance Percent" />
    <column row="2" caption=" " width="30" AutoWidth="false" name="RadioInsurancePercent" IsEditable="true" datatype="radio" radioOrientation="vertical" radioItems=" " />
    <column row="2" caption="Value" width="70" AutoWidth="false" name="InsurancePercent" IsEditable="true" datatype="number" decimalPlaces="0" minValue="0" maxValue="100" />
    <column row="1" colspan="2" caption="Insurance Plan" />
    <column row="2" caption="PCP/Specialist" width="95" AutoWidth="false" name="RadioInsurancePlanPhysician" IsEditable="true" datatype="radio" radioOrientation="vertical" radioItems=" " />
    <column row="2" caption="Other" width="55" AutoWidth="false" name="RadioInsurancePlanOther" IsEditable="true" datatype="radio" radioOrientation="vertical" radioItems=" " />
    <column row="1" rowspan="2" caption="Copay Amount" width="70" AutoWidth="false" name="FixedCopayAmount" datatype="currency" IsEditable="true" />
    <column row="0" rowspan="3" caption="Contract Type" width="55" AutoWidth="false" name="ContractTypeCode.Name" sort="true"/>
    <column row="0" rowspan="3" caption="Family Planning" width="55" AutoWidth="false" name="FamilyPlanning" IsEditable="true" datatype="bool" />
    <column row="0" rowspan="3" caption="Alt Insurance Plan" width="55" AutoWidth="false" name="UseAlternateInsurancePlan" IsEditable="true" datatype="bool" />
    <column row="0" rowspan="3" caption="Edit Billing Rule" width="70" visible="false" IsEditable="true" datatype="CustomCellType" celltype="iMedica.Prm.Client.UI.BaseControls.Spread.PrmNeoCellImageButton,iMedica.Prm.Client.UI.BaseControls" ShowSortIndicator="false" ImageResourceName="iMedica.Prm.Client.UI.BaseControls.Icons.BillingRule.png" ImageResourceAssembly="iMedica.Prm.Client.UI.BaseControls" sort="false" />
  </FeeScheduleFeeAA>
</spread>
Display name
  • 1,228
  • 1
  • 18
  • 29
  • 3
    Learn how to use an html parser and your days will be happier. For python:`beautifulsoup`, for java: `jsoup` for php: `DOMDocument` and so on... – Pedro Lobito Apr 20 '17 at 14:46
  • @PedroLobito What is he html parser you type of? And can it help me with my XML problem? – Display name Apr 20 '17 at 14:50
  • 1
    It's easy-peasy with xpath. The regex solution looks really hard though. – Tamas Rev Apr 20 '17 at 14:52
  • @TamasRev, does this mean you have an xslt solution to my problem? I would accept that too. I am desperate to find a way to automate these changes. – Display name Apr 20 '17 at 14:54
  • I'm preparing an answer for you, you need to set `autowidh=true` on every column and remove `width` if it exists, right ? How does the XML look like? a bigger example would be helpful. – Pedro Lobito Apr 20 '17 at 14:54
  • @PedroLobito, close, but I need both attributes to be there. Replace the width="\d*" so that it the result is: width="\d*" AutoWidth="false" – Display name Apr 20 '17 at 14:56
  • 1
    Please update your question and post a more complete input code and an example of the desired output. Is this `xml`or `html`? – Pedro Lobito Apr 20 '17 at 14:58
  • Does it need to be regex? xpath is a lot easier in this situation: `/column[@width and not(@AutoWidth)]` – 3limin4t0r Apr 20 '17 at 15:04
  • Added an `xpath` solution to select only the right elements. It's sort of impossible to fix with regex because it would need a vairable length negative lookbehind - that's usually not supported. On the other hand, it's easy find these lines with `grep` and **two** regexes. I'm adding an answer for that too. – Tamas Rev Apr 20 '17 at 15:08
  • Also, `AutoWidth="\d*"` will never result in a match since `AutoWidth` is always true or false and doesn't contain digits. So this should be `AutoWidth="\w*"` or `AutoWidth="(?:true|false)"`. – 3limin4t0r Apr 20 '17 at 15:08
  • 1
    Obligatory reference: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454. – michael.hor257k Apr 20 '17 at 15:13
  • @JohanWentholt, I don't understand the AutoWidth="(?:true|false)" bit. Can you explain? – Display name Apr 20 '17 at 15:48
  • @micheal.hor257k, The obligatory reference was so funny, I chocked on my lunch while reading it. I'm sorry for accidentally summoning tainted souls, and giving succor to unholy children. I definitely don't want the Russian hackers. I've learned my lesson: I'll not use Regex to fix up (xml|html|xhtml) any more. – Display name Apr 20 '17 at 16:37
  • @Dysmondad `AutoWidth="(?:true|false)"` means the same as `AutoWidth="(true|false)"`. This means the text `AutoWidth="` followed by either `true` or `false` followed by `"`. The `?:` at the start of the group means that it's a non-capturing group. Most programming languages and editors let you refer back to groups, generally using `$1` or `\1` for the first group, `$2` or `\2` for the second group. By adding `?:` you exclude it from back reference. I use this a lot when I know I don't need the captured data. – 3limin4t0r Apr 20 '17 at 16:42
  • There is no concept of a "line" in XML. – Michael Kay Apr 20 '17 at 17:32

3 Answers3

6

This is a rather trivial problem in XSLT. Given a well-formed input such as:

XML

<root>
    <column name="Selected" AutoWidth="false" IsEditable="true" datatype="bool" width="20"/>
    <column width="40" AutoWidth="false" name="ExternalIdOrEmpty" index="XIDname" sort="true"/>
    <column width="40" name="Tax Rate" index="TRname" sort="true" AutoWidth="false"/>
    <column width="40" name="Total Tax" index="TTname" sort="true"/>
    <column name="Tax Deductible" index="TDname" sort="true"/>
</root>

the following stylesheet:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="column/@width[not(../@AutoWidth)]">
    <xsl:copy/>
    <xsl:attribute name="AutoWidth">False</xsl:attribute>
</xsl:template>

</xsl:stylesheet>

will return:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <column name="Selected" AutoWidth="false" IsEditable="true" datatype="bool" width="20"/>
  <column width="40" AutoWidth="false" name="ExternalIdOrEmpty" index="XIDname" sort="true"/>
  <column width="40" name="Tax Rate" index="TRname" sort="true" AutoWidth="false"/>
  <column width="40" AutoWidth="False" name="Total Tax" index="TTname" sort="true"/>
  <column name="Tax Deductible" index="TDname" sort="true"/>
</root>

This matches a width attribute that does not have a sibling AutoWidth, copies it and adds the missing sibling. Here I have limited the scope to column elements only, but you can extend it to any element by doing:

<xsl:template match="@width[not(../@AutoWidth)]">
michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • That's a good start point. Is there a way to emit the entire document with the fix ups applied? The reason I was using regex is because I can use the replacement aspect too, and thereby fix the document. I guess I was not clear on that point. – Display name Apr 20 '17 at 15:40
  • @Dysmondad That's exactly what this does: all the nodes not matched by the 2nd template are handled by the *identity transform* template - i.e. copied *as is*. – michael.hor257k Apr 20 '17 at 15:46
  • Thank you. That works with Notepad++ via the XML plugins->Transform XML. Exactly what I need. – Display name Apr 20 '17 at 15:59
4

The xpath is this: //column[@width and not(@AutoWidth)].

Explanation:

  • //column finds all <column ...> elements
  • [...] contains the predicates
  • @width checks the presence of the @widht attribute
  • not(@AutoWidth) checks the absence of the @AutoWidth attribute.

I used the xpath tester on freeformatter.com for testing.

I added a <foo> element to make it well-formatted XML. I.e. this was the actual xml I used for testing:

<foo>
  <column name="Selected" AutoWidth="false" IsEditable="true" datatype="bool" width="20"/>
  <column width="40" AutoWidth="false" name="ExternalIdOrEmpty" index="XIDname" sort="true"/>
  <column width="40" name="Tax Rate" index="TRname" sort="true" AutoWidth="false"/>
  <column width="40" name="Total Tax" index="TTname" sort="true"/>
  <column name="Tax Deductible" index="TDname" sort="true"/>
</foo>

Then, this is the xpath: //column[@width and not(@AutoWidth)]

It selects only one item: <column index="TTname" name="Total Tax" sort="true" width="40"/>. I believe this is what you need.

Tamas Rev
  • 7,008
  • 5
  • 32
  • 49
1

There is another quick solution with grep. It needs a bash shell, e.g. the one of git-bash from windows.

cat lines.txt | grep -P -v 'AutoWidth="[^"]*"' | grep -P 'width="[^"]*"'

Explanation:

  • cat lines.txt - this is where your data comes from
  • grep -P' enables perl syntax for the sake of simplicity
  • grep -v keeps only the non-matching lines
  • "[^"]*" matches everything between the quotes, but doesn't go further after the first quote

This is the result with your example data:

4->  <column width="40" name="Total Tax" index="TTname" sort="true"/>
Tamas Rev
  • 7,008
  • 5
  • 32
  • 49