0

I've got an xml feed coming from Twitter which I want to transform using XSLT. What I want the xslt to do is to replace every occuring URL in an twittermessage. I've already created the following xslt template using this and this topic here on stackoverflow. How can I achieve this? If I use the template as below i'm getting an infinite loop but I don't see where. As soon as I comment out the call to the 'replaceAll'-template everything seem to work, but then ofcourse no content of the twittermessage gets replaced. I'm new to XSLT so every bit of help is welcome.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
    <xsl:output method="text" omit-xml-declaration="yes" indent="yes"  encoding="utf-8" />
    <xsl:param name="html-content-type" />
    <xsl:variable name="urlRegex" select="8"/>
    <xsl:template match="statuses">
        <xsl:for-each select="//status[position() &lt; 2]">
            <xsl:variable name="TwitterMessage" select="text" />
            <xsl:call-template name="replaceAll">
                <xsl:with-param name="text" select="$TwitterMessage"/>
                <xsl:with-param name="replace" select="De"/> <!--This should become an regex to replace urls, maybe something like the rule below?-->
                <xsl:with-param name="by" select="FOOOO"/> <!--Here I want the matching regex value to be replaced with valid html to create an href-->
                <!--<xsl:value-of select="replace(text,'^http://(.*)\.com','#')"/>
                <xsl:value-of select="text"/>-->
            </xsl:call-template>
            <!--<xsl:value-of select="text"/>-->
            <!--<xsl:apply-templates />-->
        </xsl:for-each>
    </xsl:template>

    <xsl:template name="replaceAll">
        <xsl:param name="text"/>
        <xsl:param name="replace"/>
        <xsl:param name="by"/>
        <xsl:choose>
            <xsl:when test="contains($text,$replace)">
                <xsl:value-of select="substring-before($text,$replace)"/>
                <xsl:value-of select="$by"/>
                <xsl:call-template name="replaceAll">
                    <xsl:with-param name="text" select="substring-after($text,$replace)"/>
                    <xsl:with-param name="replace" select="$replace"/>
                    <xsl:with-param name="by" select="$by"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>

EDIT: This in an example of the xml feed.

<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
<status>
  <created_at>Mon May 16 14:17:12 +0000 2011</created_at>
  <id>10000000000000000</id>
  <text>This is an message from Twitter http://bit.ly/xxxxx http://yfrog.com/xxxxx</text>
<status>

This is just the basic html twitter outputs on an url like below;

http://twitter.com/statuses/user_timeline.xml?screen_name=yourtwitterusername

This text;

This is an message from Twitter http://bit.ly/xxxxx http://yfrog.com/xxxxx

Should be converted to;

This is an message from Twitter <a href="http://bit.ly/xxxxx>http://bit.ly/xxxxx</a> <a href="http://yfrog.com/xxxxx">http://yfrog.com/xxxxx</a>
Community
  • 1
  • 1
Rob
  • 6,731
  • 12
  • 52
  • 90
  • Have you considered that you might be using the wrong technology? XSLT is great at transforming the structure of XML, but terrible at modifying its content! For this sort of task I would use something like Linq-to-XML so that I can use C# code for making these changes. – ColinE May 23 '11 at 12:16
  • @ColinE, good point! The problem here only is that i'm working with an standard CMS component providing me this data. But will consider this with the projectteam. You got any other ideas on how to solve this using the mentioned technologies? – Rob May 23 '11 at 12:22
  • Could you provide a bit of your XML input? – Emiliano Poggi May 23 '11 at 12:30
  • @empo, added an example. – Rob May 23 '11 at 12:41
  • This is not clear. Could you, please, provide just the source text, the resulting text you want and explain the rules for the replacement operation? I would recommend to use XSLT 2.0 which together with XPath 2.0 has suppoert for regular expressions processing. – Dimitre Novatchev May 23 '11 at 12:54
  • @Dimitre, What I want is simple. In my last example I added the xml feed part from twitter. I want every URL in the text-node of the xml feed replaced with the acual url instead of plain text. This needs to be done using XSLT. – Rob May 23 '11 at 13:47
  • @Rob: Please, a single example: This text: "xxx yyy ..." must be converted to this text: "AAa bbb ...." – Dimitre Novatchev May 23 '11 at 14:37
  • @Dimitre, question is updated. Hope is clearer now. – Rob May 23 '11 at 14:50
  • @Rob: If you provide the rules for the syntax of an URL, this then becomes a real problem. In particular, in your updated question it is not clear where a URL ends. I have met a RegEx claiming to match exactly any URL -- this could be adapted to the `matches()` and/or `replace()` functions of XPath 2.0 and the processing with XSLT 2.0 shouldn't be difficult. To repeat and summarize again, wht is missing is a strict definition for the syntax of an "URL". – Dimitre Novatchev May 23 '11 at 16:03

2 Answers2

1

Generally, I wouldnt implement a new replace function. I'd use the one provided by EXSLT. If your XSLT processor supports exslt, you just need to set the stylesheet as follows:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:regex="http://exslt.org/regular-expressions"
                extension-element-prefixes="regex"
                version="1.0">

Otherwise download and imort the stylesheet from EXSLT.

For a global replace you can use the function as follows:

<xsl:value-of select="regexp:replace(string($TwitterMessage), 'yourppatern', 'g', 'yourreplace')" />

Sorry for the general answer, but I'm not able to test XSLT at the moment.

Emiliano Poggi
  • 24,390
  • 8
  • 55
  • 67
1

So, your question isn't about XSLT. What you want is to find out the best option for manipulating a text string in XPath. If you are using a standalone XSLT engine, you can probably use XPath 2, which just about has the power you need, though with regexs it will get a bit fiddly. If you are running this from an engine with EXSLT support, you will need to look up what functions are available there. If you are running this from PHP, text manipulation is generally very good to hand over to the PHP code; you do that by make a PHP function to do what you want, and call it from the XSLT using php:function('f-name', inputs ...) as the XPath expression.

As far as regexs go, I guess you are looking for something pretty much along these lines:

send (https?://.*?)(?=[.,:;)]*($|\s)) to <a href="$1">$1</a>.

If it doesn't match all URLs, that's fine, and you only need to handle incoming data as well as Twitter's munging. Checking for punctuation at the end (the [] in the regex) is really the only tricky thing that your users will expect you to do.

Nicholas Wilson
  • 9,435
  • 1
  • 41
  • 80
  • In the end I ended up not using xslt but javscript which works for now. Not the most elegant solution, but for the moment the easiest since I'm running a deadline for this project. Your answer was closest because the regex does exaclty what I needed. – Rob May 24 '11 at 07:28