0

How can I retrieve a set of links present as list items in an ordered list using XPath? I need to retrieve all sections from a forum with the following html code:

<div id="pagewrapper" class="fixed">
<div id="toplinks" class="toplinks" style="position: relative; top: 145px;">
<div class="above_body" style="height: 210px;">
<div class="body_wrapper">
<div id="breadcrumb" class="breadcrumb">
<div id="pagetitle">
<ol id="forums" class="floatcontainer">
<li id="cat3" class="forumbit_nopost new L1">
<div class="forumhead tcat foruminfo L1 collapse">
<div class="tbody_left">
<div class="tbody_right">
<ol id="c_cat3" class="childforum">
<li id="forum9" class="forumbit_post new L2">
<div class="forumrow table">
<div class="foruminfo td" style="padding-top: 12px; padding-bottom: 12px;">
<img id="forum_statusicon_9" class="forumicon" alt="" src="elitex360/statusicon/forum_new-48.png">
<div class="forumdata">
<div class="datacontainer">
<div class="titleline">
<h2 class="forumtitle">
<a href="https://forums.com/forum/index">Forum index</a> <!-- get this link -->
</h2>
</div>
<p class="forumdescription">
</div>
</div>
</div>
<h4 class="nocss_label">Forum Actions:</h4>
<h4 class="nocss_label">Forum Statistics:</h4>
<ul class="forumstats td" style="padding-top: 18px; padding-bottom: 12px;">
<div class="forumlastpost td">
</div>
</li>
<li id="forum22" class="forumbit_post new L2">
<li id="forum40" class="forumbit_post new L2">
</ol>
<div class="tbody_under"></div>
</div>
</div>
<div class="tfoot">
</li>
<li id="cat4" class="forumbit_nopost new L1">
<li id="cat52" class="forumbit_nopost new L1">
<li id="cat5" class="forumbit_nopost new L1">
<li id="cat6" class="forumbit_nopost new L1">
<li id="cat7" class="forumbit_nopost old L1">
</ol>

The section links I have to retrieve are marked in the code above (<!-- get this link -->). I am now using the following string to retrieve all list items:

//div[@id='pagewrapper']/div[3]/ol

retrieving all list items. But I don't know how to "enter" each list item and retrieve the link label's contents. In the examples I found, knowledge of the number of list items is necessary prior to accessing them. That is not the case as the forum may have a different number of list items (the template is for a forum engine, not a forum in particular).

How can I retrieve all links within the list items?

Community
  • 1
  • 1
Sebi
  • 4,262
  • 13
  • 60
  • 116
  • 2
    Consider to include any input sample as a code sample here on stackoverflow instead of linking to an image. As for XPath, if you use `//div[@id='pagewrapper']/div[3]/ol/li//a[@href]`, you will select all link elements inside the `li` list item elements inside that ordered list. – Martin Honnen Feb 03 '16 at 11:45
  • I'm confused. What "erased line"? Next, `//div[@id='pagewrapper']/div[3]/ol` does not give you list items, it gives you lists. Further, there is not a single link in any of the list items in your sample HTML. Maybe, instead of paraphrasing what you want to select, write down an exact list of HTML elements you want to select. – Tomalak Feb 03 '16 at 11:56
  • @Tomalak In the original post there was an image with the link I wanted to extract crossed out. I've updated the question. – Sebi Feb 03 '16 at 12:01
  • Okay, and what exactly is the problem? The XPath to get that link is so straightforward that I don't dare to write it down because there must be a catch to it. – Tomalak Feb 03 '16 at 12:18

1 Answers1

1

Try below xpath to get the URL:-

//a[contains(.,'Forum index')]/@href

If you want all li in ol as I understand from you question then the xpath is as below:-

//div[@id='pagewrapper']//li[@id='cat3']//ol//li

I think Below is the xpath you are expecting:-

   //div[@id='pagewrapper']//div/@href

Hope it will help you :)

Shubham Jain
  • 16,610
  • 15
  • 78
  • 125