-2

I have a .js file. This is a javascript file with text like below. I want to extract all of the href URLs and add them to a variable inside a loop for processing further. How can I do this? Thanks very much.

 document.write('<tr bgcolor="#6691BC">'); document.write('<td
 width="15" height="25">&nbsp;</td>'); document.write('<td width="690"
 height="25" class="headertext">');

 document.write('<a href="../myspace.com/index.html" class="headerLink"
 style="color: #ffffff;">My Space</a>&nbsp;&nbsp;|&nbsp;&nbsp;');

 document.write('<a href="../technotes.com/index.html"
 class="headerLink" style="color: #ffffff;">Tech
 Notes</a>&nbsp;&nbsp;|&nbsp;&nbsp;');

 document.write('<td width="15" height="25">&nbsp;</td>');
 document.write('</tr>');
dotnet user
  • 1
  • 1
  • 3

2 Answers2

1

I would adopt a different approach - first convert your html into a single xhtml string (note the missing </td>, and & will need to be escaped as &amp;)

var xhtml = [
'<tr bgcolor="#6691BC">', 
  '<td width="15" height="25">&amp;nbsp;</td>',
  '<td width="690" height="25" class="headertext">',
    '<a href="../myspace.com/index.html" class="headerLink" style="color: #ffffff;">My Space</a>&amp;nbsp;&amp;nbsp;|',
    '<a href="../technotes.com/index.html" class="headerLink" style="color: #ffffff;">Tech Notes</a>'
  '</td>',
  '<td width="15" height="25"><a id="JustAnAnchor">Anchor</a></td>',
'</tr>'].join("");

document.write(xhtml);

You'll then need to solve the challenge of applying the xslt transform in javascript.

The following xslt will extract the hrefs from all <a href> tags and dump them into a comma delimited list which you can then use back in javascript (There should be no need to remove the extraneous last trailing comma)

<?xml version="1.0" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:template match="/">
        <xsl:apply-templates select="//a[@href]"></xsl:apply-templates>
    </xsl:template>

    <xsl:template match="a">'<xsl:value-of select="@href"/>',</xsl:template>
</xsl:stylesheet>

Output:

'../myspace.com/index.html','../technotes.com/index.html',
Community
  • 1
  • 1
StuartLC
  • 104,537
  • 17
  • 209
  • 285
0

XSLT cannot parse Javascript easily. It's the wrong tool for the job.

Here are some approaches you could pursue:

(1) Run the javascript, capture the resulting document, then use XSLT on that. This may be troublesome if the document is not well formed XML.

(2) Use regular expressions e.g. grep, perl -e, Javascript match function

(3) Run the javascript, then use document.querySelectorAll('*[href]') to grab all the elements with an href and work form there

Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
Jon Cooke
  • 92
  • 3