how would I speed up my python pycurl looping requests

Question

I am attempting to just generate traffic on the network by having a large list of sites that I am opening from a text file.

I then would like to get the site and all the href links. Go to those links and then the site, then proceed onto the next site in the text document.

My problem (that I have been noticing) is that it taking a while to execute these statements, Upwards of 5 seconds per curl. Is this because of my excessive use of try except loops? I'm just trying to understand where the problem may be.

2018-03-14 16:30:32.590135

#!/usr/bin/python
from bs4 import BeautifulSoup
import urllib2
import pycurl
from io import BytesIO
import os
import re
import sys
import random
from datetime import datetime

links = []

while True:
  with open("topdomains3.txt", "r") as f:
      domains = list(f)
      joker=random.randint(1, len(domains))
      for i in domains[joker:len(domains)]:
        i=i.replace("\\n", "")
        i=i.replace("None", "")
        i=i.rstrip()
        print i
        try:
          c = pycurl.Curl()
          c.setopt(c.URL, i)
          c.setopt(pycurl.TIMEOUT, 3)
          c.setopt(c.FOLLOWLOCATION, True)
          c.setopt(c.MAXREDIRS , 5)
          try:
            i='http://' + i
            html_page = urllib2.urlopen(i)
            soup = BeautifulSoup(html_page, 'html5lib')
          except Exception,e:
            print e
            continue
          for link in soup.findAll('a', attrs={'href': re.compile("^http")}):

            links.append(link.get('href').replace("u", ""))
          for a in links:
            try:
              print "----------------------------------------------------------"
              print str(datetime.now())
              print a
              d = pycurl.Curl()
              #c.setopt(c.VERBOSE, True)
              d.setopt(d.URL, str(a))
              #c.setopt(c.WRITEDATA, buffer)
              d.setopt(d.TIMEOUT, 3)
              d.setopt(d.FOLLOWLOCATION, True)
              d.setopt(d.MAXREDIRS , 5)
              #d.setopt(pycurl.WRITEFUNCTION, lambda x: None)
              d.perform()
              d.close()
            except pycurl.error:
              continue
          c.perform()
          c.close()
        except pycurl.error:
          continue

any assistance would be appreciated.

What's the reason for using PyCurl over something like requests? — G_M, Mar 14 '18 at 21:20
I've just heard pycurl was faster for me to get the result. https://stackoverflow.com/questions/15461995/python-requests-vs-pycurl-performance — Spyderz, Mar 14 '18 at 21:33
The urllib2 section is just for beautiful soup to grab the href links on the page.The script as a whole is just for traffic generation, so for the most part I don't care about the response from the web server. — Spyderz, Mar 14 '18 at 22:51
I am trying to fill up an http traffic logging device. It's not necessarily the pycurl that takes 5 seconds it just takes about 5 seconds per loop in the for loop. Since pycurl is faster I am having that do most of the requests and urllib only gets the href links. — Spyderz, Mar 14 '18 at 23:10
Have you thought about using [`Scrapy`](https://scrapy.org/)? It's asynchronous and would probably be able to send a lot more traffic and grab the links for you too. Or do your requests have to be one after the other (synchronous)? — G_M, Mar 14 '18 at 23:11
I found and considered scrapy one I had most of the script done. Didn't know it was async. Can you async with scrapy on python 2.7? Or is that exclusive to 3.4+? — Spyderz, Mar 14 '18 at 23:18
Scrapy works on both 2 & 3 (I think it uses twisted for async). Yeah, [2.7 and 3.4+](https://docs.scrapy.org/en/latest/intro/install.html#installing-scrapy) — G_M, Mar 14 '18 at 23:20
Thanks I'll have to try it out. Async would definitely help generate traffic faster. Do you have any links to an example code that would help me out in my situation? — Spyderz, Mar 14 '18 at 23:26

how would I speed up my python pycurl looping requests

http://www.ipostparcels.com/parcel-delivery/amazon-parcel-delivery

http://www.ipostparcels.com/parcel-delivery/abot-ipostparcels

http://www.ipostparcels.com/parcel-delivery/parcel-delivery-rates

http://www.ipostparcels.com/parcel-delivery/parcel-collection-and-delivery

http://www.ipostparcels.com/parcel-delivery/post-for-a-post

http://www.ipostparcels.com/parcel-delivery/discont-codes-and-offers

http://www.ipostparcels.com/corier/ebay-corier-service

0 Answers0