0

The resulting output: a txt file with empty lines.

The expected output: a txt file with words of "Привет Мир! Это я, обычный неработающий текст или рыба" text.

What am I doing wrong? Tried nested xsl:for-each code gives out the same kind of behavior.

Oleg
  • 35
  • 4
  • It is the usual FAQ problem of the input having a default namespace sest up that is in scope for all descendants so the `div` and `span` elements are in the XHTML namespace while your XSLT/XPath 1.0 attempt tries to select `div` or `span` elements in no namespace. Easy fix exists in XSLT 2 or 3 declaring `xpath-default-namespace="http://www.w3.org/1999/xhtml"` on the `xsl:stylesheet`. Needs an XSLT 2 or 3 processor like Saxon 9, 10 or 11, available in the context of Java FOP without problems. – Martin Honnen May 28 '22 at 06:26

1 Answers1

0

I see 2 problems in your attempt:

  1. Your instruction:

    <xsl:for-each select="//div [@class='ocr_page'] /div [@class='ocr_carea'] / p [@class='ocr_par'] / span[@class='ocr_line'] / span [@class='ocrx_word']">
    

    selects nothing, because your input XML puts all its elements in a namespace. See here how to solve this.

  2. Once you have it working, this instruction will put you in the context of span. From this context, your next instruction:

     <xsl:value-of select="normalize-space(span [@class='ocrx_word'])" disable-output-escaping="yes"/>
    

    also selects nothing, because span is not a child of itself. It should be:

    <xsl:value-of select="normalize-space(.)"/>
    

    and I doubt you want to disable output escaping in a stylesheet producing an XML result.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • I understand the second problem and corrected it. Could you, @michael.hor257k please explain this problem more clearlier than in your's link? – Oleg May 28 '22 at 07:41
  • 1
    Not sure what's not clear: add say `xmlns:x="http://www.w3.org/1999/xhtml"` to your `xsl:stylesheet` start tag, then add the `x:` prefix to all element names in your XPath expression. – michael.hor257k May 28 '22 at 07:49