1

I need help in parsing href tags. Currently, everything is being parsed as text, however I need to parse the links so that I can send it to the php page later using AJAX.

my HTML looks like:

<div id="word_content">
<br>Testing Time: 2015-10-29 17:57:11<br>
    Total Age: 19<br>
    Total Friemd: 9<br>
    Total Family: 10<br>
    <br>
Here are the suggestions  - Him_530037_: <a href="www.mytarget.com="_blank">93358546</a>
<h3>Overview</h3><br>
<ul>
    <li>(The overlap provided is not good)</li>
</ul>

<h3>Structure</h3><br>
<h4>Target:</h4><br>
<ul>
    <li>Audience.</li>
    <li>Lookalike</li>
    <li>Overlap of Audience</li> 
    <a href="https://www.myPage.com/lolPagess/?id=06" target="_blank">06<font name="names" hidden="" style="display: inline;"> - Page Likes</font></a>           
</ul>

Jquery Code is something like this:

var headTags = $("div#word_content").find("*").filter(function(){
                return /^h/i.test(this.nodeName);
              });

              var output = {};

              $(headTags).each(function(){
                var currentHead = $(this);

                var nextNextElem = currentHead.next().next();
                var innerText = [];
                if(nextNextElem.prop("tagName") == "UL")
                  {
                     nextNextElem.find("li").each(function(){
                       innerText.push($(this).text());
                     });  

                  }

                output[currentHead.text()] = innerText;
              });  

Currently, the Jquery is fetching the data, but it is capturing only the text and not the link. I need to parse the link as well, so that this link could be used in further pages. Can someone please help.

user4943236
  • 5,914
  • 11
  • 27
  • 40

3 Answers3

1

use this:

 nextNextElem.find("a").each(function(){
         innerText.push($(this).text()+" & href is:"+$(this).attr("href"));                   
                         }); 

var headTags = $("div#word_content").find("*").filter(function(){ 
 return /^h/i.test(this.nodeName); 
 }); 

 var output = {}; 

 $(headTags).each(function(){ 
 var currentHead = $(this); 

 var nextNextElem = currentHead.next().next(); 
 var innerText1 = []; 
 if(nextNextElem.prop("tagName") == "UL") 
 { 
 nextNextElem.find("li").each(function(index){ 
 innerText1.push(this.firstChild.data);
 $(this).children().each(function(index){ 
 innerText1.push("<a href='"+$(this).attr("href")+"'>"+$(this)[0].innerText+"</a>"); 
    if($(this).prop('nextSibling')){
    innerText1.push($(this).prop('nextSibling').nodeValue);
         }
 }); 
 }); 

 } 

 output[currentHead.text()] = innerText1; 
 });      console.log(output);
             $("#data").html(JSON.stringify(output));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
   <div id="word_content">
<br>Testing Time: 2015-10-29 17:57:11<br>
Total Age: 19<br>
Total Friemd: 9<br>
Total Family: 10<br>
<br>
Here are the suggestions  - Him_530037_: <a href="www.mytarget.com="_blank">93358546</a>
<h3>Overview</h3><br>
<ul> 
<li>Multiple Countries 
<a href="https://www.myTarget.com/ads/?id=603" target="_blank">603<font name="names" hidden="" style="display: none;"> - Post: "သင့္ရဲ့ Data အသံုးျပဳ မွုကို အေၾကာင္းၾကားေပးေသာ..."</font></a> (MM, SG), 
<a href="https://www.myTarget.com/ads/?id=602" target="_blank">602<font name="names" hidden="" style="display: none;"> - Post: "Mynamar pics."</font></a></li> 

</ul>
</div>
<span>OUTPUT AREA:</span>
<div id="data"></div>
Suchit kumar
  • 11,809
  • 3
  • 22
  • 44
  • let me test.. will get back to you – user4943236 Nov 16 '15 at 09:48
  • I'm afraid to say that the links are not being generated. It is still using innertext. – user4943236 Nov 16 '15 at 09:54
  • @user4943236 you mean you want ther href as link ?.if that is the case try updated one. – Suchit kumar Nov 16 '15 at 09:56
  • Actually, I need text along with hrefs. i.e. I need to parse "https://www.myPage.com/lolPagess/?id=06" and this should become href and "06" as well – user4943236 Nov 16 '15 at 10:05
  • actually, there are some links on the page along with some text .. so, I need to parse all the links and the text and I'm sending this to another php file, where I would be extracting to display them as links.Currently, hrefs are not being parsed and only innertexts are being parsed – user4943236 Nov 16 '15 at 10:07
  • I'm checking the browser console and still the values generated are text: 60306 -Myanmar - Page Likes (1,123 USD - 1.86%), – user4943236 Nov 16 '15 at 10:09
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/95225/discussion-between-user4943236-and-suchit). – user4943236 Nov 16 '15 at 10:18
0

You can use something like this to parse links in the site:

$("a").each(function(i, o) {
    console.log("Link: " + (i + 1));
    console.log("  Text is: " + $(o).text());
    console.log("  Link is: " + $(o).attr('href'));
})

Result:

www.mytarget.com=
https://www.myPage.com/lolPagess/?id=06

See JsFiddle

DDan
  • 8,068
  • 5
  • 33
  • 52
0

Check every href inside an a

$("a").each(function () {
    isUrlValid($(this).attr("href"));
});

borrowed from Validating url with jQuery without the validate-plugin?:

  function isUrlValid(url) {
        return /^(https?|s?ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?$/i.test(url);
    }

This regex will test for valid url's.

Community
  • 1
  • 1
online Thomas
  • 8,864
  • 6
  • 44
  • 85