5

I need to perform a regular expression style replacement of querystrings from all the attributes in an MRSS RSS feed, stripping them down to just the url. I've tried a few things here using suggests from here: XSLT Replace function not found but to no avail

<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<atom:link href="http://www.videojug.com/user/metacafefamilyandeducation/subscriptions.mrss" type="application/rss+xml" rel="self" />
<title>How to and instructional videos from Videojug.com</title>
<description>Award-winning Videojug.com has over 50k professionally-made instructional videos.</description>
<link>http://www.videojug.com</link>
<item>
  <title>How To Calculate Median</title>
  <media:content url="http://direct.someurl.com/54/543178dd-11a7-4b8d-764c-ff0008cd2e95/how-to-calculate-median__VJ480PENG.mp4?somequerystring" type="video/mp4" bitrate="1200" height="848" duration="169" width="480">
    <media:title>How To Calculate Median</media:title>
    ..
  </media:content>
</item>

any suggestions really helpful

Community
  • 1
  • 1

2 Answers2

3

If you're using XSLT 2.0, you can use tokenize():

  <xsl:template match="media:content">
    <xsl:value-of select="tokenize(@url,'\?')[1]"/>
  </xsl:template>

Here's another example of only changing the url attribute of media:content:

  <xsl:template match="media:content">
    <media:content url="{tokenize(@url,'\?')[1]}">
      <xsl:copy-of select="@*[not(name()='url')]"/>
      <xsl:apply-templates/>
    </media:content>
  </xsl:template>

EDIT

To handle all url attributes in your instance, and leave everything else unchanged, use an identity transform and only override it with a template for @url.

Here's a modified version of your sample XML. I've added two attributes to description for testing. The attr attribute should be left untouched and the url attribute should be processed.

XML

<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
  <channel>
    <atom:link href="http://www.videojug.com/user/metacafefamilyandeducation/subscriptions.mrss" type="application/rss+xml" rel="self"/>
    <title>How to and instructional videos from Videojug.com</title>
    <!-- added some attributes for testing -->
    <description attr="don't delete me!" url="http://www.test.com/foo?anotherquerystring">Award-winning Videojug.com has over 50k professionally-made instructional videos.</description>
    <link>http://www.videojug.com</link>
    <item>
      <title>How To Calculate Median</title>
      <media:content url="http://direct.someurl.com/54/543178dd-11a7-4b8d-764c-ff0008cd2e95/how-to-calculate-median__VJ480PENG.mp4?somequerystring" type="video/mp4" bitrate="1200" height="848"
        duration="169" width="480">
        <media:title>How To Calculate Median</media:title>
        .. 
      </media:content>
    </item>
  </channel>
</rss>

XSLT

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <!--Identity Transform-->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@url">
    <xsl:attribute name="url">
      <xsl:value-of select="tokenize(.,'\?')[1]"/>
    </xsl:attribute>
  </xsl:template>

</xsl:stylesheet>

OUTPUT (Using Saxon 9.3.0.5)

<rss xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:media="http://search.yahoo.com/mrss/"
     version="2.0">
   <channel>
      <atom:link href="http://www.videojug.com/user/metacafefamilyandeducation/subscriptions.mrss"
                 type="application/rss+xml"
                 rel="self"/>
      <title>How to and instructional videos from Videojug.com</title>
      <!-- added some attributes for testing --><description attr="don't delete me!" url="http://www.test.com/foo">Award-winning Videojug.com has over 50k professionally-made instructional videos.</description>
      <link>http://www.videojug.com</link>
      <item>
         <title>How To Calculate Median</title>
         <media:content url="http://direct.someurl.com/54/543178dd-11a7-4b8d-764c-ff0008cd2e95/how-to-calculate-median__VJ480PENG.mp4"
                        type="video/mp4"
                        bitrate="1200"
                        height="848"
                        duration="169"
                        width="480">
            <media:title>How To Calculate Median</media:title>
        .. 
      </media:content>
      </item>
   </channel>
</rss>
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • ok - looks good but there may be other things in this file that also have url attributes. I want to trim ALL of these attribute values. if i change the match to @url it will just match that attribute value (as i understand it) i'm unclear on how i can ensure that when i write that back it just overwrites the attribute and preserves the rest of the element? – RichHalliwell May 27 '11 at 09:27
  • @RichHalliwell: You would ensure that you only overwrite the url attribute by using an identity transform to handle everything else (other elements, attribute, text, etc.). Please see my edit for an example. – Daniel Haley May 27 '11 at 15:03
2

String handling in XSLT is generally a lot easier with XSLT 2.0, but in this case it looks easy enough to achieve the requirement using the substring-before() function which is present since XSLT 1.0.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164