How to use threading to speed up my function?

Question

I'm on working on my tool.

So I have this function :

import subprocess, os, platform, ctypes, requests, random, threading
from bs4 import BeautifulSoup as bs

temptotal = 0
totalurl = 0 
retry = 0
load = 0
load2 = 0
loaded = 0
dorksdone = 0
tempourl = 0

#Import Proxy List
selecting = 1
while selecting == 1:
    try:
        option = int(input("Choose Type Proxy(1 = http, 2=socks4, 3 = socks5) :")
    except:
        option = 404
 
    if option == 1:
        selecting = 0
        prox = 'http'
        proxyyyy = 'http'
    elif option == 2:
        selecting = 0
        prox = 'socks4'
        proxyyyy = 'socks4'
    elif option == 3:
        selecting = 0
        prox = 'socks5'
        proxyyyy = 'socks5'
    else:
        print("Choose valid numbre such as 1, 2 or 3!") 
proxy_list = input("Give me Proxylist :" )
with open(proxy_list, mode="r", encoding="utf-8") as mf:
    for line in mf:
        load2 += 1
print(" ")
print("Total Proxy loaded :"  + str(load2))
print(" ")

#import keywordfile
dorkslist = input("Give me KeywordList/Dorklist :" + bcolors.ENDC + " ")
with open(dorkslist, mode="r", encoding="utf-8") as mf:
    for line in mf:
        load += 1
    mf.close()
    
print(" ")
print("Total Dorks loaded:" + str(load)) 
print(" ")

#define url to check
yahoourl = {"https://fr.search.yahoo.com/search?p=&fr=yfp-search-sb",
"https://fr.search.yahoo.com/search?p=&fr=yfp-search-sb&b=11&pz=10"}

#funtion i want to speed up
def checker():
    global temptotal
    global loaded
    global dorksdone
    global tempourl
    proxy = set()    
    with open(proxy_list, "r") as f:
        file_lines1 = f.readlines()
        for line1 in file_lines1:
            proxy.add(line1.strip())    
    with open(dorkslist, mode="r",encoding="utf-8") as my_file:
        for line in my_file:
            loaded += 1
            threading.Thread(target=titre).start()    
            indorks = line
            encode = requote_uri(indorks)
            for yahoo in yahoourl:
                yahooo = yahoo.replace("&fr",encode + "&fr")
                try:
                    proxies = {
                    'http': prox+'://'+random.choice(list(proxy))
                    }    
                    r = requests.get(yahooo, proxies=proxies)
                    print("Dorks used :" + indorks )
                    dorksdone += 1
                    soup = bs(r.text, 'html.parser')
                    links =  soup.find_all('a')
                    for link in soup.find_all('a'):
                        a = link.get('href')
                        unquote(a)
                        temptotal += 1
                        with open("Bing.txt", mode="a",encoding="utf-8") as fullz:
                            fullz.write(a + "\n")
                            fullz.close()
                        lines_seen = set() # holds lines already seen
                        outfile = open("Bingnodup.txt", "w", encoding="utf-8")
                        for line in open("Bing.txt", "r", encoding="utf-8"):
                            if line not in lines_seen: # not a duplicate                 
                                outfile.write(line)
                                lines_seen.add(line)
                        outfile.close()
                        with open("Bingnodup.txt", mode="r", encoding="utf-8") as cool:
                            for url in cool:            
                                try:
                                    proxies = {
                                    'http': prox+'://'+random.choice(list(proxy))
                                    } 
                                    response = requests.get(url, proxies=proxies)                
                                    save = response.url
                                    with open("Bingtemp.txt", mode="a", encoding="utf-8") as cool1:                    
                                        cool1.write(save + "\n")
                                        tempourl += 1
                                    cool1.close()
                                except:
                                    pass
                except:
                    raise
    fin()
           

#start bot

bot1 = threading.Thread(target=checker)

bot1.start()

bot1.join()

Exemple file for Keyword:

python
wordpress

Exemple file for proxy(http so take 1 on choice) :

46.4.96.137:8080
223.71.167.169:80
219.248.205.117:3128
198.24.171.34:8001
51.158.123.35:9999

But this function when running is very very very slow, could who let me know how i can give boost to this function ? Because i have try to use this topic: How can I use threading in Python?

But i didn't understand how to build in into the right way for my function.

Can you please upload an example we can reproduce? So can you give us an example of the proxy list and what is `prox`? — miquelvir, Mar 28 '21 at 11:42
I have update my code for you to try. So proxylist is in format : : , for exemple 185.195.2.63:80. Dorklist/Keyword list is like one key one each line. Thanks for you try to help me. — Akyna, Mar 28 '21 at 11:48
please provide a runnable example; prox is not declared; no example proxy list file, etc. — miquelvir, Mar 28 '21 at 15:03
Post a good MRE and you will have more chances of getting a good solution. Read [this](https://stackoverflow.com/help/minimal-reproducible-example); your example is not minimal by any means. SO is not a coding service, so help us help you. — miquelvir, Mar 29 '21 at 14:39
What i really need is a good exemple of how to use threading system with custom import i didn't find any solution — Akyna, Mar 29 '21 at 14:46

score 1 · Answer 1 · answered Mar 27 '21 at 17:08

Your script is what's called I/O bound. What this means is that it is not slow because the CPU needs to perform long computations, but because it needs to wait a lot every time it requests a URL (the bottleneck are the requests to the internet).

For concurrency you have 3 options:

asyncio
threading
multiprocessing

The first two are the ones which can help you in I/O bound problems like yours. The first one is the recommended approach in a problem like this, since there is a library available with support for async/await.

This is an adapted example from the above link, which does exactly what you need:

import asyncio
import time
import aiohttp


def get_proxies():
    if platform.system() == "Linux":
        clear = lambda: os.system('clear')
        clear()
    if platform.system() == "Windows":
        clear = lambda: os.system('cls')
        clear()
    proxy = set()
    with open("proxy.txt", "r") as f:
        file_lines1 = f.readlines()
        for line1 in file_lines1:
            proxy.add(line1.strip())
    return proxy


async def download_site(session, url, proxies):
    async with session.get(url, proxies=proxies) as response:
        save = response.url
        with open("Yahootemp.txt", mode="a", encoding="utf-8") as cool1:
            cool1.write(save + "\n")


async def download_all_sites(sites, proxies):
    async with aiohttp.ClientSession() as session:
        tasks = []
        for url in sites:
            task = asyncio.ensure_future(download_site(session, url, proxies))
            tasks.append(task)
        await asyncio.gather(*tasks, return_exceptions=True)


if __name__ == "__main__":
    proxies = get_proxies()
    proxies = {
        'http': prox + '://' + random.choice(list(proxies))
    }
    sites = []
    with open("Yahoonodup.txt", mode="r", encoding="utf-8") as cool:
        for url in cool:            
            sites.append(url)
    asyncio.get_event_loop().run_until_complete(download_all_sites(sites, proxies))

You could make it even faster if saving the files seems to still be too slow; read this.

Okay, thanks a lot for take time to answer i will take care to you're code and i'll come back if i have any trouble — Akyna, Mar 27 '21 at 18:51
I didn't successful to use my program with you're fix for my code... — Akyna, Mar 27 '21 at 20:08
what issue have you found? I was not able to run it bc you did not provide a minimum reproducible example — miquelvir, Mar 27 '21 at 20:17
Issue is it's didn't working, like i have try to just import you're code new source and import other module i need, and i didn't get my response.url in my txt file — Akyna, Mar 27 '21 at 20:38
SO is not a coding service. Please provide what you have tried and which errors have you run into precisely. — miquelvir, Mar 27 '21 at 21:09
I know but i just try to use code with create an new python file. I run it and it's seems to not work — Akyna, Mar 27 '21 at 21:32
I have update my code for you try to better understand what i'm trying to do — Akyna, Mar 28 '21 at 11:37

How to use threading to speed up my function?

1 Answers1