11

I have a plain text file structured like this:

!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
...

Is it possible to get with XSLT a file similar to:

<?xml version="1.0" encoding="UTF-8" ?>
<document>
  <ITEM_NAME>Item value</ITEM_NAME>
  <ANOTHER_ITEM>Its value</ANOTHER_ITEM>
  ...
</document>

EDIT

I am sorry I haven't clearly stated before. I am trying to accomplish this transformation with the Visual Studio 2005 XSLT engine. I have tried both of the provided solutions, and I am sure that are correct. But Visual Studio 2005 doesn't know the unparsed-text function.

sblandin
  • 904
  • 4
  • 11
  • 25
  • This is not possible, as the only valid input to XSLT is well-formed Xml. – O. R. Mapper Apr 12 '13 at 14:48
  • 1
    @O.R.Mapper according to this question it is possible [link](http://stackoverflow.com/questions/5675889/regular-text-file-to-xml-using-xslt) – sblandin Apr 12 '13 at 14:53
  • Interesting. Today's entry in the "A lightbulb can *also* be used to fry a steak." list ... thanks for the link :-) – O. R. Mapper Apr 12 '13 at 15:06
  • 2
    @O.R.Mapper Or "using cannon to fire to a fly". Any other suggestion to make the input text file more "manageable"? Maybe with .NET framework? ;-) – sblandin Apr 12 '13 at 15:12
  • Well can you use an XSLT 2.0 processor like Saxon 9 or AltovaXML or XmlPrime? – Martin Honnen Apr 12 '13 at 17:30
  • 1
    Dear @O.R.Mapper, FYI, XSLT 2.0 has been successfully and efficiently used in implementing a generic LR-1 parser and based on this, for parsers of such languages as JSON and XPath 2.0. The use of modern parsing methods, or of regular expressions where they are most suitable is exactly the opposite of "A lightbulb can also be used to fry a steak". – Dimitre Novatchev Apr 13 '13 at 04:12
  • @DimitreNovatchev: I am not convined of the suitability when regular expressions with capturing groups have to be used already for a basic task such as reading a file line-wise and recognizing the first letter of some lines, but it's good to know XSLT can be used beyond its originally intended scope when required :-) – O. R. Mapper Apr 13 '13 at 11:17
  • @O.R.Mapper, Well, your data is 6-7 years old. XPath 3.0 has a standard functions for getting the lines of a text file -- `unparsed-text-lines()`: http://www.w3.org/TR/xpath-functions-30/#func-unparsed-text-lines . The F & O 3.0 is already a Candidate Recommendation of the W3C. As for the first letter of a line, one can use even the XPath 1.0 function substring. In XPath 3.0 the function `head()` (http://www.w3.org/TR/xpath-functions-30/#func-head) provides generic access to the head of any sequence. – Dimitre Novatchev Apr 13 '13 at 14:41
  • @DimitreNovatchev: I based my response on your answer from 11 hours ago, which I assumed would consider the current state of development. – O. R. Mapper Apr 13 '13 at 14:55
  • @O.R.Mapper, My answer is for someone who most probably has little or no awareness of XSLT 2.0 -- or is limited by their management to use only XSLT 1.0. Very few people even know XSLT 3.0/XPath 3.0 exists. In case you are one of these, you can see what is easily possible in pure XPath 3.0 only, here: http://www.xfront.com/Pearls-of-XSLT-and-XPath-3-0-Design.pdf – Dimitre Novatchev Apr 13 '13 at 15:01
  • @DimitreNovatchev: Thanks for the link, that looks interesting :) – O. R. Mapper Apr 13 '13 at 15:09

2 Answers2

10

If you can use XSLT 2.0 you could use unparsed-text()...

Text File (Do not use the text file as direct input to the XSLT.)

!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
!TEST_BANG
Here's a value with !bangs!!!

XSLT 2.0 (Apply this XSLT to itself (use the stylesheet as the XML input). You'll also have to change the path to your text file. You might have to change the encoding too.)

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="text-encoding" as="xs:string" select="'iso-8859-1'"/>
    <xsl:param name="text-uri" as="xs:string" select="'file:///C:/Users/dhaley/Desktop/test.txt'"/>

    <xsl:template name="text2xml">
        <xsl:variable name="text" select="unparsed-text($text-uri, $text-encoding)"/>
        <xsl:analyze-string select="$text" regex="!(.*)\n(.*)">
            <xsl:matching-substring>
                <xsl:element name="{normalize-space(regex-group(1))}">
                    <xsl:value-of select="normalize-space(regex-group(2))"/>
                </xsl:element>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </xsl:template>

    <xsl:template match="/">
        <document>
            <xsl:choose>
                <xsl:when test="unparsed-text-available($text-uri, $text-encoding)">
                    <xsl:call-template name="text2xml"/>                                
                </xsl:when>
                <xsl:otherwise>
                    <xsl:variable name="error">
                        <xsl:text>Error reading "</xsl:text>
                        <xsl:value-of select="$text-uri"/>
                        <xsl:text>" (encoding "</xsl:text>
                        <xsl:value-of select="$text-encoding"/>
                        <xsl:text>").</xsl:text>
                    </xsl:variable>
                    <xsl:message><xsl:value-of select="$error"/></xsl:message>
                    <xsl:value-of select="$error"/>
                </xsl:otherwise>
            </xsl:choose>
        </document>
    </xsl:template>
</xsl:stylesheet>

XML Output

<document>
   <ITEM_NAME>Item value</ITEM_NAME>
   <ANOTHER_ITEM>Its value</ANOTHER_ITEM>
   <TEST_BANG>Here's a value with !bangs!!!</TEST_BANG>
</document>
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
8

This XSLT 2.0 transformation:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:variable name="vText" select=
 "replace(unparsed-text('file:///c:/temp/delete/text.txt'),'\r','')"/>

 <xsl:template match="/">
  <document>
      <xsl:analyze-string select="$vText" regex="(!(.+?)\n([^\n]+))+">
       <xsl:matching-substring>
         <xsl:element name="{regex-group(2)}">
                <xsl:sequence select="regex-group(3)"/>
         </xsl:element>
       </xsl:matching-substring>
       <xsl:non-matching-substring><xsl:sequence select="."/></xsl:non-matching-substring>
      </xsl:analyze-string>
  </document>
 </xsl:template>
</xsl:stylesheet>

when appliedon any XML document (not used) and having the provided text residing in the local file C:\temp\delete\Text.txt:

!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
...

produces the wanted, correct result:

<document>
   <ITEM_NAME>Item value</ITEM_NAME>
   <ANOTHER_ITEM>Its value</ANOTHER_ITEM>
...
</document>

To test more completely, we put this text in the file:

As is text
!ITEM_NAME
Item value
!ANOTHER_ITEM
Its value
As is text2
!TEST_BANG
Here's a value with !bangs!!!
!TEST2_BANG
 !!!Here's a value with !more~ !bangs!!!
As is text3

The transformation again produces the wanted, correct result:

<document>As is text
<ITEM_NAME>Item value</ITEM_NAME>
<ANOTHER_ITEM>Its value</ANOTHER_ITEM>
As is text2
<TEST_BANG>Here's a value with !bangs!!!</TEST_BANG>
<TEST2_BANG> !!!Here's a value with !more~ !bangs!!!</TEST2_BANG>
As is text3
</document>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Many thanks for all the answers... I just discovered that Visual Studio 2005 (the IDE I am forced to use) XSLT parser doesn't know the unparsed-text function. Sorry for not providing enough context. – sblandin Apr 15 '13 at 06:50
  • 1
    Very useful suggestion sir. – Rudramuni TP Jan 08 '20 at 11:02