I have a list of about 80 sites where I need to get HTML
and after that use XPath
on them. A doubt similar to this: How to Multi-thread an Operation Within a Loop in Python But the question itself is how to make requests respect a loop, how would I put them inside def
?
I tried to do it by normal method, ie several request.get()
but it is not a viable method as it takes a LOT time to finish all requests ...
I look this but apparently it is just to get the site status, and searching other link, I found this but I could not understand
Has the grequests method but it only gets the status too?
I need to get the HTML
code and save it either in a vector or a variable.
To use xpath and transform html to string, I was thinking of using lxml
An example just for you to understand what I need (I will use the grequest
and lxml
methodology to make it easy to understand):
import grequest
from lxml import html
values = []
urls = [
'http://www.heroku.com',
'http://tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://kennethreitz.com'
]
values[len(values)] = html.fromstring(grequests.get(u) for u in urls) #How retire the for?
if values[len(values)] == 1:
Value_Search_One = values[len(values)].xpath(xpath_one)
if values[len(values)] == 2:
Value_Search_Two = values[len(values)].xpath(xpath_two)
if values[len(values)] == 3:
Value_Search_Three = values[len(values)].xpath(xpath_three)
I know there are a lot of errors in this code, but I just wanted to give you an idea of the result I need to find.
As you can see, I am totally lost in what and how to do it. If anyone can help me put together code to make multiple HTML requests with one quick method, I'd appreciate it. I've read a lot of things but I'm not sure what to do.
Note: It doesn't have to be just these examples of the links I quoted, if anyone knows any easier or more useful method, I'm accepting.
*I am using python 3.4