0

so, mi problem seems quite simple, but I'm stuck. I want to fill text elements in a xml based on a id attribute, my xml is a PageXML looking like this:

<?xml version="1.0" encoding="UTF-8"  standalone="yes"?>
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
  <Metadata>
    <Creator>myself</Creator>
    <Created>2021-07-03T09:37:54.369908+00:00</Created>
        <LastChange>2021-07-03T09:37:54.369944+00:00</LastChange>

  </Metadata>
  <Page imageFilename="05.tif" imageWidth="3243" imageHeight="4077">

    <TextRegion id="eSc_dummyblock_">
      <TextLine id="eSc_line_b74d9f71" >
        <Coords points="1376,108 1390,67 1492,78 1492,166 1431,149 1407,166 1390,149 1376,156"/>
        <Baseline points="1380,112 1499,112"/>
        <TextEquiv>
          <Unicode></Unicode>
        </TextEquiv>
      </TextLine>

      <TextLine id="eSc_line_5aceacfb" >
        <Coords points="2882,173 2882,142 2947,125 2947,292 2920,288 2882,309"/>
        <Baseline points="2886,176 2954,176"/>
        <TextEquiv>
          <Unicode>toto</Unicode>
        </TextEquiv>
      </TextLine>
      
      </TextRegion>
    
  </Page>
</PcGts>

I just want to pass a xslt template in order to fill each Unicode element with a different value according to the TextLine id attribute . Something like this must work, but, nothing happens.

import lxml.etree as ET

dom = ET.parse(filename)
xslt_root = etree.XML(
'''<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="xml" indent="yes" encoding="UTF-8" omit-xml-declaration="no"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@id[. = 'eSc_line_b74d9f71']/*/Unicode/text()[. = '']">something else</xsl:template>

</xsl:stylesheet>''')

transform = ET.XSLT(xslt_root)
newdom = transform(dom)

The desired output:

<?xml version="1.0" encoding="UTF-8"?>
<TextRegion id="eSc_dummyblock_">
  <TextLine id="eSc_line_b74d9f71">
    <Coords points="1376,108 1390,67 1492,78 1492,166 1431,149 1407,166 1390,149 1376,156"/>
    <Baseline points="1380,112 1499,112"/>
    <TextEquiv>
      <Unicode>something else</Unicode>
    </TextEquiv>
  </TextLine>
  <TextLine id="eSc_line_5aceacfb">
    <Coords points="2882,173 2882,142 2947,125 2947,292 2920,288 2882,309"/>
    <Baseline points="2886,176 2954,176"/>
    <TextEquiv>
      <Unicode/>
    </TextEquiv>
  </TextLine>
</TextRegion>

I will appreciate your help

----SOLUTION---

as it was suggested by @michael.hor257k the solution is to declare the same namespace in the xslt stylesheet:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:met="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"
exclude-result-prefixes="met">

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="met:TextLine[@id='eSc_line_b74d9f71']/met:TextEquiv/met:Unicode">
    <xsl:copy>something else</xsl:copy>
</xsl:template>

</xsl:stylesheet>
  • 1
    Please edit your question and add the expected result. Also clarify exactly what you mean by *"a different value according to the TextLine id attribute*". – michael.hor257k Jul 25 '21 at 20:31

2 Answers2

0

How about changing your line

<xsl:template match="@id[. = 'eSc_line_b74d9f71']/*/Unicode/text()[. = '']">something else</xsl:template>

to

<xsl:template match="*[@id = 'eSc_line_b74d9f71']/TextEquiv/Unicode"><Unicode>something else</Unicode></xsl:template>

or, a more general version

<xsl:template match="*[@id = 'eSc_line_b74d9f71']/TextEquiv/Unicode"><xsl:copy>something else</xsl:copy></xsl:template>

With your input, this will give you the output

<?xml version="1.0" encoding="UTF-8"?>
<TextRegion id="eSc_dummyblock_">
  <TextLine id="eSc_line_b74d9f71">
    <Coords points="1376,108 1390,67 1492,78 1492,166 1431,149 1407,166 1390,149 1376,156"/>
    <Baseline points="1380,112 1499,112"/>
    <TextEquiv>
      <Unicode>something else</Unicode>
    </TextEquiv>
  </TextLine>
  <TextLine id="eSc_line_5aceacfb">
    <Coords points="2882,173 2882,142 2947,125 2947,292 2920,288 2882,309"/>
    <Baseline points="2886,176 2954,176"/>
    <TextEquiv>
      <Unicode/>
    </TextEquiv>
  </TextLine>
</TextRegion>

That should be as desired.

zx485
  • 28,498
  • 28
  • 50
  • 59
  • Hi there, thanks for your answer. Both options are logical but sadly any of them produce the desired output, in fact, nothing happens. Maybe is a matter of header declarations? – Magistermilitum Jul 25 '21 at 20:06
0

If you want to add a value to the Unicode element, then have your template match the Unicode element:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="Unicode">
    <xsl:copy>
        <xsl:if test="../../@id='eSc_line_b74d9f71'">something else</xsl:if>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Or simply:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="TextLine[@id='eSc_line_b74d9f71']/TextEquiv/Unicode">
    <xsl:copy>something else</xsl:copy>
</xsl:template>

</xsl:stylesheet>

Note also that an attribute has no children. And a text node cannot be empty. Each of these is a sufficient reason why your template will never match anything.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Thanks for your answer. Indeed i think the problem is linked to the empty text, but even when the element Unicode is not empty i did get any transformation using the suggested xslt templates. I just put a full example of my xlm. – Magistermilitum Jul 25 '21 at 22:51
  • You can see my code working here: https://xsltfiddle.liberty-development.net/93wniUp – michael.hor257k Jul 25 '21 at 22:54
  • See here why my code does not work with your actual input and how to fix it: https://stackoverflow.com/a/34762628/3016153 – michael.hor257k Jul 25 '21 at 23:01
  • Ah, grazie mille my friend, I was stuck for hours with this problem. The solution was to declare the same namespace into my xslt stylesheet. – Magistermilitum Jul 25 '21 at 23:19