1

Is it possible to parse this kind of text in XSLT :

Détail|Numéro appelé|Date et heure|Quantité réelle|Qantité facturée|H.T.|T.T.C.
Appel vers un portable|0611XXXXXX|14/06 - 09h32|00h00mn23s|00h00mn23s|gratuit|gratuit
Appel vers un portable|0688XXXXXX|14/06 - 10h39|00h01mn16s|00h01mn16s|gratuit|gratuit
Appel vers un portable|0611XXXXXX|18/06 - 07h24|00h00mn50s|00h00mn50s|gratuit|gratuit
Appel vers un portable|0688XXXXXX|20/06 - 09h32|00h00mn23s|00h00mn23s|gratuit|gratuit
Appel vers un portable|0688XXXXXX|20/06 - 10h44|00h01mn27s|00h01mn27s|gratuit|gratuit
Appel vers un portable|0611XXXXXX|25/06 - 21h09|00h00mn22s|00h00mn22s|gratuit|gratuit
Appel vers un portable|0626XXXXXX|29/06 - 11h25|00h00mn27s|00h00mn27s|gratuit|gratuit
Appel vers un portable|0688XXXXXX|02/07 - 13h39|00h02mn37s|00h02mn37s|gratuit|gratuit

This table is content in a variable and I want to replace "|" character by </td><td> (maybe <th> on the first line if it possible) and add <tr>.

And... do this with XSLT 1.0.

Thanks a lot.

  • Can you include a sample of your input and desired output directly in the question rather than as a link to pastebin? On StackOverflow we prefer the question to be self contained if possible. – Ian Roberts Oct 09 '12 at 20:51
  • Why bother with XSLT? XSLT is designed to read valid XML. Sure, in XSLT2 you can read just about any input, even non-XML, but for this example you'd be much better off with a simple Perl or Python script. – Jim Garrison Oct 10 '12 at 02:56
  • @JimGarrison I know, but I don't have the choice. –  Oct 10 '12 at 12:35

4 Answers4

0

It will be painful without using extensions and you will have to pass it as an external parameter or put inside some tags. Xslt is about tranforming Xml files and not text files so use the right tool for the job.

Pawel
  • 31,342
  • 4
  • 73
  • 104
0

It's a great deal easier to parse input like this using the regex functionality of XSLT 2.0.

But if it's really the case that you just want one tr per line, and a td boundary at each vertical bar, you can do that conveniently with two named templates (or with one more complicated template): one to parse the variable and break off one line at a time, passing the line to a second named template, which parses the character sequence into a sequence of tr elements. In pseudo-code, something like this:

template name="emit-rows"
  param name="input"
  choose when $input = ''
     // do nothing
  otherwise 
     <tr>
      call-template name="emit-columns"
        with-param name="s"
                   value="substring-before($input,'&#xA;')
     </tr>
     call-template name="emit-rows"
       with-param name="input"
                  value="substring-after($input,'&#xA;')

template name="emit-columns"
  param name="s"
  choose when $s = ''
     <!--* do nothing *-->
  otherwise
     <td>
       value-of substring-before($s,'|')
     </td>
     call-template name="emit-columns"
        with-param name="s" value="substring-after($s,'|')
C. M. Sperberg-McQueen
  • 24,596
  • 5
  • 38
  • 65
0

Here's a simple Perl script to do what you want

my $tag = "th";
while(<>)
{
    s/[\r\n]*$//;
    print "<$tag>\n";
    for $f (split /\|/)
    {
        print "<td>$f</td>\n";
    }
    print "</$tag>\n";
    $tag = "tr";
}

You may have to fiddle with character encodings depending on where you run it (WIndows vs Linux) to ensure that your accented characters aren't mangled. I'm leaving that as an exercise for you.

Here's the output from the first 3 lines of your input:

<th>
<td>D▒tail</td>
<td>Num▒ro appel▒</td>
<td>Date et heure</td>
<td>Quantit▒ r▒elle</td>
<td>Qantit▒ factur▒e</td>
<td>H.T.</td>
<td>T.T.C.</td>
</th>
<tr>
<td>Appel vers un portable</td>
<td>06110XXXXX</td>
<td>14/06 - 09h32</td>
<td>00h00mn23s</td>
<td>00h00mn23s</td>
<td>gratuit</td>
<td>gratuit</td>
</tr>
<tr>
<td>Appel vers un portable</td>
<td>06889XXXXX</td>
<td>14/06 - 10h39</td>
<td>00h01mn16s</td>
<td>00h01mn16s</td>
<td>gratuit</td>
<td>gratuit</td>
</tr>
Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
0

As others have said, this is definately not the job for XSLT - however, just for fun it can be done using XSLT1.0. This was tested using XMLSpy - so it might need to use exslt to convert the result tree fragments to node-sets. But the principle is the same. For the technical solution, jump down past the next few paragraphs

Edit (based on a comment made in the question): I don't have enough rep to comment at the right place however, this question is a perfect example of when a technological solution isn't enough.

You've said that you "don't have a choice" around using XSLT, but I think that highlights a problem that I've seen in a lot of IT Professionals. You may not have a choice regarding what the current state of your environment at work (I'm presuming this is a work solution, because no right thinking educator would use this as an example of how to use XSLT). What you do have a choice in is how to approach delivering this solution to whomever has requested it.

The real solution to this question is that XSLT is more certainly not the correct way to solve this problem, and that is what your employer or client needs to be told. While my solution below is perfectly valid approach to solving your problem, I would suggest that it is not best practise, not an appropriate use of XSLT and not the most efficient way of solving this problem, and in a professional environemnt you have a responsibility to say the same things.

People may not appreciate being told that their environment is wrong, or that how they want something solve isn't how it should be solved, but if they are coming to you for advice, then your advice should include critque where appropriate. So by all means, present the solution below, but be sure to present the arguements above - that way even if they implement it in XSLT, at least you have done your duty to inform and educate, and if that critique is documented if eventually end up using a poor solution, it won't be your responsibility if it breaks.

Actual XSLT solution starts here:

Input (note that you need a tag around the code for it to be valid XML):

<text>Détail|Numéro appelé|Date et heure|Quantité réelle|Qantité facturée|H.T.|T.T.C.
Appel vers un portable|06110XXXXX|14/06 - 09h32|00h00mn23s|00h00mn23s|gratuit|gratuit
Appel vers un portable|06889XXXXX|14/06 - 10h39|00h01mn16s|00h01mn16s|gratuit|gratuit
Appel vers un portable|06110XXXXX|18/06 - 07h24|00h00mn50s|00h00mn50s|gratuit|gratuit
Appel vers un portable|06889XXXXX|20/06 - 09h32|00h00mn23s|00h00mn23s|gratuit|gratuit
Appel vers un portable|06889XXXXX|20/06 - 10h44|00h01mn27s|00h01mn27s|gratuit|gratuit
Appel vers un portable|06110XXXXX|25/06 - 21h09|00h00mn22s|00h0n0mn22s|gratuit|gratuit
Appel vers un portable|06267XXXXX|29/06 - 11h25|00h00mn27s|00h00mn27s|gratuit|gratuit
Appel vers un portable|06889XXXXX|02/07 - 13h39|00h02mn37s|00h02mn37s|gratuit|gratuit
Appel vers un portable|06889XXXXX|02/07 - 18h17|00h06mn55s|00h06mn55s|gratuit|gratuit
Appel vers un portable|06110XXXXX|05/07 - 19h29|00h00mn15s|00h00mn15s|gratuit|gratuit</text>

XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <xsl:variable name="lines">
            <xsl:call-template name="tokenize">
                <xsl:with-param name="string">
                    <xsl:value-of select="."/>
                </xsl:with-param>
                <xsl:with-param name="token">
                    <xsl:value-of select="'&#10;'"/>
                </xsl:with-param>
            </xsl:call-template>
        </xsl:variable>
        <table>
            <xsl:for-each select="$lines/match">
                <xsl:variable name="cells">
                    <xsl:call-template name="tokenize">
                        <xsl:with-param name="string">
                            <xsl:value-of select="."/>
                        </xsl:with-param>
                        <xsl:with-param name="token">
                            <xsl:value-of select="'|'"/>
                        </xsl:with-param>
                    </xsl:call-template>
                </xsl:variable>
                <tr>
                    <xsl:for-each select="$cells/match">
                        <td><xsl:value-of select="."/></td>
                    </xsl:for-each>
                </tr>
            </xsl:for-each>
        </table>
    </xsl:template>
    <!--
        Tokenize with a string and token allows us to split up string on a given token and return a node-set of all of the separate components in <match> tags.
        Taken from: http://stackoverflow.com/a/141022/764357
        Then modified to use a generic split token.
    -->
    <xsl:template name="tokenize">
        <xsl:param name="string"/>
        <xsl:param name="token" select="','"/>
        <xsl:param name="count" select="0"/>
        <xsl:variable name="first_elem" select="substring-before(concat($string,$token), $token)"/>
        <!-- Make sure at least one token at the end exists -->
        <xsl:variable name="remaining" select="substring-after($string, $token)"/>
        <match>
            <xsl:value-of select="$first_elem"/>
        </match>
        <!--
            We check that the remaining list is not just a single token, if it is then the recursive base case has been identified.
        -->
        <xsl:if test="$remaining and $remaining != $token">
            <xsl:call-template name="tokenize">
                <xsl:with-param name="string" select="$remaining"/>
                <xsl:with-param name="token" select="$token"/>
                <xsl:with-param name="count" select="$count + 1"/>
            </xsl:call-template>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

And the output:

<table>
    <tr>
        <td>Détail</td>
        <td>Numéro appelé</td>
        <td>Date et heure</td>
        <td>Quantité réelle</td>
        <td>Qantité facturée</td>
        <td>H.T.</td>
        <td>T.T.C.</td>
    </tr>
    <tr>
        <td>Appel vers un portable</td>
        <td>06110XXXXX</td>
        <td>14/06 - 09h32</td>
        <td>00h00mn23s</td>
        <td>00h00mn23s</td>
        <td>gratuit</td>
        <td>gratuit</td>
    </tr>
    <tr>
        <td>Appel vers un portable</td>
        <td>06889XXXXX</td>
        <td>14/06 - 10h39</td>
        <td>00h01mn16s</td>
        <td>00h01mn16s</td>
        <td>gratuit</td>
        <td>gratuit</td>
    </tr>
    <tr>
        <td>Appel vers un portable</td>
        <td>06110XXXXX</td>
        <td>18/06 - 07h24</td>
        <td>00h00mn50s</td>
        <td>00h00mn50s</td>
        <td>gratuit</td>
        <td>gratuit</td>
    </tr>
    <tr>
        <td>Appel vers un portable</td>
        <td>06889XXXXX</td>
        <td>20/06 - 09h32</td>
        <td>00h00mn23s</td>
        <td>00h00mn23s</td>
        <td>gratuit</td>
        <td>gratuit</td>
    </tr>
    <tr>
        <td>Appel vers un portable</td>
        <td>06889XXXXX</td>
        <td>20/06 - 10h44</td>
        <td>00h01mn27s</td>
        <td>00h01mn27s</td>
        <td>gratuit</td>
        <td>gratuit</td>
    </tr>
    <tr>
        <td>Appel vers un portable</td>
        <td>06110XXXXX</td>
        <td>25/06 - 21h09</td>
        <td>00h00mn22s</td>
        <td>00h0n0mn22s</td>
        <td>gratuit</td>
        <td>gratuit</td>
    </tr>
    <tr>
        <td>Appel vers un portable</td>
        <td>06267XXXXX</td>
        <td>29/06 - 11h25</td>
        <td>00h00mn27s</td>
        <td>00h00mn27s</td>
        <td>gratuit</td>
        <td>gratuit</td>
    </tr>
    <tr>
        <td>Appel vers un portable</td>
        <td>06889XXXXX</td>
        <td>02/07 - 13h39</td>
        <td>00h02mn37s</td>
        <td>00h02mn37s</td>
        <td>gratuit</td>
        <td>gratuit</td>
    </tr>
    <tr>
        <td>Appel vers un portable</td>
        <td>06889XXXXX</td>
        <td>02/07 - 18h17</td>
        <td>00h06mn55s</td>
        <td>00h06mn55s</td>
        <td>gratuit</td>
        <td>gratuit</td>
    </tr>
    <tr>
        <td>Appel vers un portable</td>
        <td>06110XXXXX</td>
        <td>05/07 - 19h29</td>
        <td>00h00mn15s</td>
        <td>00h00mn15s</td>
        <td>gratuit</td>
        <td>gratuit</td>
    </tr>
</table>
  • Thanks, this code doesn't seem to use external libraries, so I tried to test it in the w3c's online editor (http://www.w3schools.com/xsl/tryxslt.asp?xmlfile=cdcatalog&xsltfile=cdcatalog) but it does not work, did I forget something ? –  Oct 10 '12 at 09:42
  • 1
    As I said, it may require you to use XSLT Extensions which is available in all major engines. XMLSpy converts tree fragments to nodeset automatically (and incorrectly), which may cause errors. However, when I tried to use the code in the W3Schools link you provided it did error, but on the line that contains the newline entity (` `) which leads me to thing its an encoding issue on their end. For future reference, always be wary when using W3Schools documentation or live tools as they have a bad rep for getting stuff wrong - http://w3fools.com/ –  Oct 10 '12 at 22:42
  • Hi, is it possible to apply this template on a variable like `` ? Thanks –  Oct 16 '12 at 15:18
  • It is possible, you use that in place of `` if that element has the data. **However, I strongly recommend you read the additions I made to my answer.** XSLT is the wrong tool for the job you are trying to solve, and if you can't use another tool you should at the least raise this fact with whoever is requesting this work be done using XSLT. –  Oct 16 '12 at 23:04