0

I have a string of HTML, and would like to replace the values in each href attribute on anchors with a modified value at a later time. To do this, I'd like to grab the index into the HTML string that the href attribute starts at (and the character it ends at), or perhaps the character in the HTML string that the anchor starts at (and the character it ends at). For example, if I have the string:

<html><head></head><body><a href='http://example.com'/></body></html>

I'd like to write a method that returns [34, 51], the index of the first character in the href and the index of the last. As far as I can tell, JQuery does not give me the index into the original HTML string of the response from a selector. Nor does any other library give me a way to determine this information.

If this is not possible with an existing Javascript library (without building a new parser), is there a library in another language that provides this, (particularly Ruby)?

AMWJ
  • 176
  • 3
  • 13

5 Answers5

0

Does this example help you? I put together this answer based on another stack overflow question I answered.

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Bootstrap Example</title>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    @*MAKE SURE YOU HAVE A reference to jquery here-I have it in my bundle*@
    <script type="text/javascript">
        $(function () {
            $("#aBtn").click(function () {
                var elems = $("div[data-src]");
                var lastOne = elems[elems.length - 1]
                //replacing the last one with C, so C is shown twice-last two
                var grandchild = lastOne.children[0].children[0]
                grandchild.srcset = "https://dummyimage.com/1024x768/000/ffffff.jpg&text=large+C"
            })
        })
    </script>
</head>
<body>
    <input type="button" id="aBtn" value="BtnTriggerInsteadOfOnLoad" />
    <div class="row">
        <div class="columns small-12">
            <div class="responsive-picture" data-src="https://dummyimage.com/1024x768/000/ffffff.jpg&text=large+A">
                <picture>
                    <!--[if IE 9]><video style="display: none;"><![endif]-->
                    <source media="(min-width: 64em)" srcset="https://dummyimage.com/1024x768/000/ffffff.jpg&text=large+A">
                    <source media="(min-width: 40em)" srcset="https://dummyimage.com/640x480/000/ffffff.jpg&text=meduim+A">
                    <source media="screen" srcset="https://dummyimage.com/320x240/000/ffffff.jpg&text=small+A">
                    <!--[if IE 9]></video><![endif]-->
                    <img alt="Placeholder Picture" src="transparent.gif">
                </picture>
            </div>
            <div class="responsive-picture" data-src="https://dummyimage.com/1024x768/000/ffffff.jpg&text=large+B">
                <picture>
                    <!--[if IE 9]><video style="display: none;"><![endif]-->
                    <source media="(min-width: 64em)" srcset="https://dummyimage.com/1024x768/000/ffffff.jpg&text=large+B">
                    <source media="(min-width: 40em)" srcset="https://dummyimage.com/640x480/000/ffffff.jpg&text=medium+B">
                    <source media="screen" srcset="https://dummyimage.com/320x240/000/ffffff.jpg&text=small+B">
                    <!--[if IE 9]></video><![endif]-->
                    <img alt="Placeholder Picture" src="transparent.gif">
                </picture>
            </div>
            <div class="responsive-picture" data-src="https://dummyimage.com/1024x768/000/ffffff.jpg&text=large+C">
                <picture>
                    <img alt="Placeholder Picture" src="transparent.gif" srcset="https://dummyimage.com/1024x768/000/ffffff.jpg&text=large+C">
                </picture>
            </div>
            @*I PUT THE SRCE BACK ON THE DIV like it is suppose to be*@
            <div class="responsive-picture" data-src="https://dummyimage.com/1024x768/000/ff0000.jpg&text=large+D">
                <picture>
                    <img alt="Placeholder Picture" src="transparent.gif" srcset="https://dummyimage.com/1024x768/000/ff0000.jpg&text=large+D"> 
                </picture>
            </div>
        </div>
    </div>
</body>
</html>
kblau
  • 2,094
  • 1
  • 8
  • 20
0

I'm not sure why you need to search through a string of HTML code, rather than working directly with the DOM, but the below function will accomplish what you need regarding your question. If you have a string with multiple anchor tags in it, you'll want to write a recursive function that's similar to the below.

var htmlString = "<html><head></head><body><a href='http://example.com'/></body></html>";

var getUrl = function (string) {
  var hrefStart = string.indexOf('href');
  var httpStart = string.indexOf("'", hrefStart) + 1;
  var httpEnd = string.indexOf("'", httpStart) - 1;
  
  return [httpStart, httpEnd]
};

console.log(getUrl(htmlString));
Bobby Speirs
  • 667
  • 2
  • 7
  • 14
0

You can also do it using plain javascript, with a bit of parsing:

function getHrefsPositions(inputHtml){

    var currentIndex = inputHtml.indexOf("href");
    var results = [];

    while (currentIndex != -1){
        var closingQuote = inputHtml.indexOf("'", currentIndex + 6);

        results.push([currentIndex+5,closingQuote]);
        currentIndex = inputHtml.indexOf("'", closingQuote + 1);
    }

    return results;
}

<html>
<head>
<script>

function getHrefs(inputHtml){
 
 var currentIndex = inputHtml.indexOf("href");
 var results = [];
 
 while (currentIndex != -1){
  var closingQuote = inputHtml.indexOf("'", currentIndex + 6);
  
  results.push([currentIndex+5,closingQuote]);
  currentIndex = inputHtml.indexOf("'", closingQuote + 1);
 }

    alert(results);
 return results;
}

getHrefs("<html><head></head><body><a href='http://example.com'/><a href='http://example.com'/></body></html>");
</script>

</head>

<body>

</body>
</html>
Isac
  • 1,834
  • 3
  • 17
  • 24
0

Why not just replace the href value using jQuery, and then convert the resulting DOM into a string?

$('a').attr('href', 'http://www.example.com');
var htmlString = $('html')[0].outerHTML;
console.log(htmlString);
Jack Taylor
  • 5,588
  • 19
  • 35
  • Because the actual modification is not happening in Javascript; it's happening in Ruby. It doesn't look like there's a good Ruby client that won't make modifications (e.g. stripping whitespace) to the resultant HTML string. – AMWJ Jun 13 '17 at 00:00
0

The following regex could possibly be of assistance:

var pattern = /href=(["'])(?:(?=(\\?))\2.)*?\1/igm
var html = "<html><head></head><body><a href='http://example.com'/></body></html>";

while (match = pattern.exec(html)) {
  console.log(match.index + ' ' + pattern.lastIndex);
}

console.log(html[28 + 6]);
console.log(html[53-1]);

Partly taken from Return positions of a regex match() in Javascript? and now updated with a pattern from Return positions of a regex match() in Javascript? Since you are working with Ruby you can also simply run the same/similar regular expression with ruby to replace the matching pattern with text you desire (I'd imagine).

Royalty
  • 392
  • 2
  • 10