0

I have the following line of code in a HTML file (or something similar):

...
<a href="#SCRIPT_NAME#?a=b&id=a/b/c/d">Link Content</a>
...

I need to be able to extract the a/b/c/d part of the href and convert the link to something like:

<a href="/lookup?id=a/b/c/d">Link Content</a>

Ideally I'd like to be able to do this with regex, but most of the regex stuff I've seen for XSLT on StackOverflow seems to require XPath 2.

Ah yes... I'm using SimpleXML/DomDocument on PHP5.3 to apply the stylesheet which I believe doesn't support v2 xslt.

I think I could do string replacement to lose the first part, but I'd like to have a pattern match to extract it.

Any thoughts?

Nick
  • 2,803
  • 1
  • 39
  • 59

3 Answers3

1

most of the regex stuff I've seen for XSLT on StackOverflow seems to require XPath 2.

Not most: all. Unless your specific XSLT 1.0 processor offers regex as a (procesor-specific) extension.

Now, the part missing from your question is how to recognize the part that you want to extract from the existing value. If, for example, it is always the substring that comes after (the first occurrence of) "id=", then you could use the substring-after() function to retrieve it.

Or at least in theory you could. In practice, nothing will work with the given example, because it contains an unescaped & character - a big no-no in XML.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Thanks Michael, I'll have a look into that. You are right that I need the ID= part of the query string. – Nick Feb 20 '15 at 06:40
  • I also do encode that before parsing as xml. Typo on writing this last night. I str_replace `&` with `&` on the way in and vice versa on the way out. – Nick Feb 20 '15 at 06:42
1

As already pointed out in the answer given by michael.hor257k, you have to adjust the & character to have valid XML. Given an input containing for example

<a href="#SCRIPT_NAME#?a=b&amp;id=a/b/c/d">Link Content</a>

the following template

<xsl:template match="a/@href[starts-with(.,'#SCRIPT_NAME#')]">
   <xsl:attribute name="href">
     <xsl:value-of select="concat('/lookup?id=', substring-after(.,'id='))"/>
  </xsl:attribute>
</xsl:template>

changes the link to

<a href="/lookup?id=a/b/c/d">Link Content</a>

matching every href starting with #SCRIPT_NAME#.
Though it's not clear from the question which is the part that has to be matched / how to identify the links that have to be adjusted, possibly you can adjust this example to fit your requirements or provide further input to your question.

matthias_h
  • 11,356
  • 9
  • 22
  • 40
1

This is just a shot in the dark, but if you are specifically looking to solve this with a regex, you may be able to use something like the following:

$xslt_string = '<a href="#SCRIPT_NAME#?a=b&id=a/b/c/d">Link Content</a>';
preg_match('/href=".+?id=(.+?)"/', $xslt_string, $matches);
print_r($matches);

https://regex101.com/r/rY7oY7/1

jbiz
  • 394
  • 1
  • 5
  • I'm aware I could do this in php natively, very easily. However I was hoping to contain all my "translation" code in the xslt and not "do some over there and some over here". Eventually this would end up in Drupal so I could also use an input filter to correct these on output. As with all things PHP there are many ways to skin a cat :) – Nick Feb 20 '15 at 06:44
  • 2
    I don't think this is a good idea, because before applying regex to a string, you would have to locate that string. IOW, you would have to *parse* the input XML - and [everyone knows](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) that you can't parse X/THML with regex. – michael.hor257k Feb 20 '15 at 08:08