so I'm pretty new to this, and I haven't been able to find anything on google on this question.
I'm using request and lxml with Python, I've seen that there's a lot of different modules for web scraping, but is there any reason to choose one over the other? Can you do the same stuff with requests/lxml as you can with for example BeautifulSoup?
Anyway, here's my actual question;
This is my code:
import requests
from lxml import html
# Login data
inputUrl = 'http://forum.mytestsite.com/login'
usr = 'myusername'
pwd = 'mypassword'
payload = dict(login=usr, password=pwd)
# Open session
with requests.Session() as s:
# Login
s.post(inputUrl, data=payload)
# Get page data
pageResult = s.get('http://forum.mytestsite.com/icons/', allow_redirects=False)
pageResult = html.fromstring(pageResult.content)
pageIcons = pageResult.xpath('//script[@id="table-icons"]/text()')
print pageIcons[0]
The result when printing pageIcons[0]:
<ul id="icons">
{{#each icons}}
<li data-handle="{{handle}}">
<img src="{{image_path}}" alt="{{desc_or_name this}}" title="{{desc_or_name this}}">
</li>
{{/each}}
</ul>
This is the website/js code that generates the icons:
<script id="table-icons" type="text/x-handlebars-template">
<ul id="icons">
{{#each icons}}
<li data-handle="{{handle}}">
<img src="{{image_path}}" alt="{{desc_or_name this}}" title="{{desc_or_name this}}">
</li>
{{/each}}
</ul>
</script>
And here's the result on the page:
<ul id="icons">
<li data-handle="558FSTBI" class="">
<img src="http://testsite.com/icons/558FSTBI.1.png" alt="Icon 1" title="Icon 1">
</li>
<li data-handle="310AYTZI">
<img src="http://testsite.com/icons/310AYTZI.1.png" alt="Icon 2" title="Icon 2">
</li>
<li data-handle="669PQXBI" class="">
<img src="http://testsite.com/icons/669PQXBI.1.png" alt="Icon 3" title="Icon 3">
</li>
</ul>
My goal:
What I would like to do is to retrieve all of li data-handles, but I haven't been able to figure out how to retrieve this data. So my goal is to retrieve all of the icon paths and their titles, could anyone help me out here? I'd really appreciate any help :)