15

I want to print all the href(links) from a website. All these hrefs are stored in an 'a' tag, and these a tags are stored in a 'li' tag. Now, I know how to select all the li's. I need a way to select all the a's within the li's to get the 'href' attribute. Tried the following but doesn't really work.

li = driver.find_elements_by_tag_name('li')
for link in li:
     a_childrens = link.find_element_by_tag_name('a')

for a in a_children
     (print a.get_attribute('href'))

Thanks in advance.

Jesse
  • 175
  • 1
  • 1
  • 8

4 Answers4

24

I recommend css_selector instead of tag_name

aTagsInLi = driver.find_elements_by_css_selector('li a')
for a in aTagsInLi:
     (print a.get_attribute('href'))
Buaban
  • 5,029
  • 1
  • 17
  • 33
  • This correctly selects and prints out hrefs on the page. However, not all of it. I think it has something to do with the fact that the href's im actually looking for are within a javascript page. I can see them in the inspect page but they don't get selected with this script. – Jesse Oct 13 '16 at 13:32
  • @Jesse They might be inside an `IFRAME` or some portion of the page that loads late. You'll have to investigate and see since you have access to the page. – JeffC Oct 13 '16 at 13:47
  • @Jesse As JeffC suggested, those missing links might be in IFRAME. I can help if you give us the URL of your application. – Buaban Oct 13 '16 at 13:49
  • @Buaban im actually trying to gather every url from every game from the Unibet football page. [https://www.unibet.eu/betting#filter/football] – Jesse Oct 13 '16 at 13:57
  • I already have a time.sleep[5] to enable the website to load properly. (Screenshot shows the whole page) – Jesse Oct 13 '16 at 14:05
  • @Buaban Any idea? – Jesse Oct 13 '16 at 17:10
  • @Jesse Sorry, I cannot access the web. It might be network policy here. I will try tomorrow when I get home. – Buaban Oct 14 '16 at 02:59
  • @Buaban Ah no problem. Thanks. Let me know! – Jesse Oct 14 '16 at 07:44
  • Got it guys! The links were actually stored in an Unordered List so i changed the above code to: – Jesse Oct 14 '16 at 17:55
  • aTagsInLi = driver.find_elements_by_css_selector('ul li a') for a in aTagsInLi: print (a.get_attribute('href')) – Jesse Oct 14 '16 at 17:56
9

Try to select the links directly:

 links = driver.find_elements_by_tag_name('a')
afonte
  • 938
  • 9
  • 17
5

You have the right idea, but part of your problem is that a_childrens = link.find_element_by_tag_name('a') will get you what you're looking for, but you're basically throwing out all of them because you get them in the loop, but don't do anything with them as you're in the loop. So you're only left with the variable from the last iteration.

Your solution, correctly implemented, might look something like this

list_items = driver.find_elements_by_tag_name("li")
for li in list_items:
    anchor_tag = li.find_element_by_tag_name("a")
    print(anchor_tag.get_attribute('href'))

That is, with the understanding that the HTML layout is as you described, something like:

<li><a href="foo">Hello</a></li>
<li><a href="bar">World!</a></li>
sytech
  • 29,298
  • 3
  • 45
  • 86
1

find_element_by_xpath would do the trick...

links = driver.find_elements_by_xpath('//li/a/@href')
Ahmed Saad
  • 144
  • 4