How to request many HTML?

Question

I have a list of about 80 sites where I need to get HTML and after that use XPath on them. A doubt similar to this: How to Multi-thread an Operation Within a Loop in Python But the question itself is how to make requests respect a loop, how would I put them inside def?

I tried to do it by normal method, ie several request.get() but it is not a viable method as it takes a LOT time to finish all requests ...

I look this but apparently it is just to get the site status, and searching other link, I found this but I could not understand

Has the grequests method but it only gets the status too?

I need to get the HTML code and save it either in a vector or a variable.

To use xpath and transform html to string, I was thinking of using lxml

An example just for you to understand what I need (I will use the grequest and lxml methodology to make it easy to understand):

import grequest
from lxml import html

values = []
urls = [
    'http://www.heroku.com',
    'http://tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://kennethreitz.com'
]
values[len(values)] = html.fromstring(grequests.get(u) for u in urls) #How retire the for?
if values[len(values)] == 1:
   Value_Search_One = values[len(values)].xpath(xpath_one)
if values[len(values)] == 2:
   Value_Search_Two = values[len(values)].xpath(xpath_two)
if values[len(values)] == 3:
   Value_Search_Three = values[len(values)].xpath(xpath_three)

I know there are a lot of errors in this code, but I just wanted to give you an idea of the result I need to find.

As you can see, I am totally lost in what and how to do it. If anyone can help me put together code to make multiple HTML requests with one quick method, I'd appreciate it. I've read a lot of things but I'm not sure what to do.

Note: It doesn't have to be just these examples of the links I quoted, if anyone knows any easier or more useful method, I'm accepting.

*I am using python 3.4

Possible duplicate of [How to Multi-thread an Operation Within a Loop in Python](https://stackoverflow.com/questions/15143837/how-to-multi-thread-an-operation-within-a-loop-in-python) — crookedleaf, Oct 23 '19 at 18:20
I added in topic why exactly isn't it. I am unsure how to mount a "request.get()" operation, I know this can be solved with multi-thread but I don't know how to do it — Pablo Abreu, Oct 23 '19 at 18:31
import `thread` library, then make a function to get just one url request and save it, then inside the loop make thread for every item and start the thread. make sure to join all the threads at the end of the program. — Atreyagaurav, Oct 23 '19 at 18:36
The solution to your problem can definitely be found in that post. You can create a function that runs the `.get()` and that function would be your target for the pool, queue, or whatever else you decide to use. — crookedleaf, Oct 23 '19 at 18:37
But which .get? And how i pass this to string to execute xpath in it ? Do I have to use lxml and grequest? To tweak multiple .get() I'll have to create an array with url + xpath? — Pablo Abreu, Oct 23 '19 at 18:49
My question is related not only to "multiprocessing" but also how to perform GET =\ — Pablo Abreu, Oct 23 '19 at 18:50
@Atreyagaurav Please could you give me an example here of how I do this? I have no idea where to start — Pablo Abreu, Oct 23 '19 at 18:55

score 0 · Accepted Answer · answered Oct 24 '19 at 02:50

use threading to do multiple processes, rough code would be

import threading 
my_threads={}
my_threads['request']=threading.Thread(target=request_url)
my_threads['request'].start()

where request_url is a function. this is only example of how to creat and run the thread. to make threads on a for loop

import requests
def start_threads(urls):
     #where urls will be tuple of urls you want to request
     urlThreads=[]
     for url in urls:
        x=threading.Thread(target=request_individual,args=(url,))
        x.start()
        urlThreads.append(x)
        time.sleep(5) #this one is optional, I made it so that it won't bottleneck my router.
def request_individual(url):
   r=requests.get(url)
   #do something

How to request many HTML?

1 Answers1