1

I'm on working on my tool.

So I have this function :

import subprocess, os, platform, ctypes, requests, random, threading
from bs4 import BeautifulSoup as bs

temptotal = 0
totalurl = 0 
retry = 0
load = 0
load2 = 0
loaded = 0
dorksdone = 0
tempourl = 0

#Import Proxy List
selecting = 1
while selecting == 1:
    try:
        option = int(input("Choose Type Proxy(1 = http, 2=socks4, 3 = socks5) :")
    except:
        option = 404
 
    if option == 1:
        selecting = 0
        prox = 'http'
        proxyyyy = 'http'
    elif option == 2:
        selecting = 0
        prox = 'socks4'
        proxyyyy = 'socks4'
    elif option == 3:
        selecting = 0
        prox = 'socks5'
        proxyyyy = 'socks5'
    else:
        print("Choose valid numbre such as 1, 2 or 3!") 
proxy_list = input("Give me Proxylist :" )
with open(proxy_list, mode="r", encoding="utf-8") as mf:
    for line in mf:
        load2 += 1
print(" ")
print("Total Proxy loaded :"  + str(load2))
print(" ")

#import keywordfile
dorkslist = input("Give me KeywordList/Dorklist :" + bcolors.ENDC + " ")
with open(dorkslist, mode="r", encoding="utf-8") as mf:
    for line in mf:
        load += 1
    mf.close()
    
print(" ")
print("Total Dorks loaded:" + str(load)) 
print(" ")

#define url to check
yahoourl = {"https://fr.search.yahoo.com/search?p=&fr=yfp-search-sb",
"https://fr.search.yahoo.com/search?p=&fr=yfp-search-sb&b=11&pz=10"}

#funtion i want to speed up
def checker():
    global temptotal
    global loaded
    global dorksdone
    global tempourl
    proxy = set()    
    with open(proxy_list, "r") as f:
        file_lines1 = f.readlines()
        for line1 in file_lines1:
            proxy.add(line1.strip())    
    with open(dorkslist, mode="r",encoding="utf-8") as my_file:
        for line in my_file:
            loaded += 1
            threading.Thread(target=titre).start()    
            indorks = line
            encode = requote_uri(indorks)
            for yahoo in yahoourl:
                yahooo = yahoo.replace("&fr",encode + "&fr")
                try:
                    proxies = {
                    'http': prox+'://'+random.choice(list(proxy))
                    }    
                    r = requests.get(yahooo, proxies=proxies)
                    print("Dorks used :" + indorks )
                    dorksdone += 1
                    soup = bs(r.text, 'html.parser')
                    links =  soup.find_all('a')
                    for link in soup.find_all('a'):
                        a = link.get('href')
                        unquote(a)
                        temptotal += 1
                        with open("Bing.txt", mode="a",encoding="utf-8") as fullz:
                            fullz.write(a + "\n")
                            fullz.close()
                        lines_seen = set() # holds lines already seen
                        outfile = open("Bingnodup.txt", "w", encoding="utf-8")
                        for line in open("Bing.txt", "r", encoding="utf-8"):
                            if line not in lines_seen: # not a duplicate                 
                                outfile.write(line)
                                lines_seen.add(line)
                        outfile.close()
                        with open("Bingnodup.txt", mode="r", encoding="utf-8") as cool:
                            for url in cool:            
                                try:
                                    proxies = {
                                    'http': prox+'://'+random.choice(list(proxy))
                                    } 
                                    response = requests.get(url, proxies=proxies)                
                                    save = response.url
                                    with open("Bingtemp.txt", mode="a", encoding="utf-8") as cool1:                    
                                        cool1.write(save + "\n")
                                        tempourl += 1
                                    cool1.close()
                                except:
                                    pass
                except:
                    raise
    fin()
           

#start bot

bot1 = threading.Thread(target=checker)

bot1.start()

bot1.join()

Exemple file for Keyword:

python
wordpress

Exemple file for proxy(http so take 1 on choice) :

46.4.96.137:8080
223.71.167.169:80
219.248.205.117:3128
198.24.171.34:8001
51.158.123.35:9999

But this function when running is very very very slow, could who let me know how i can give boost to this function ? Because i have try to use this topic: How can I use threading in Python?

But i didn't understand how to build in into the right way for my function.

Akyna
  • 67
  • 1
  • 8
  • Can you please upload an example we can reproduce? So can you give us an example of the proxy list and what is `prox`? – miquelvir Mar 28 '21 at 11:42
  • I have update my code for you to try. So proxylist is in format : : , for exemple 185.195.2.63:80. Dorklist/Keyword list is like one key one each line. Thanks for you try to help me. – Akyna Mar 28 '21 at 11:48
  • Let me know if you need i give you import modules too – Akyna Mar 28 '21 at 11:52
  • It's seems my programs didn't use proxy too – Akyna Mar 28 '21 at 13:17
  • please provide a runnable example; prox is not declared; no example proxy list file, etc. – miquelvir Mar 28 '21 at 15:03
  • 1
    Okay i fully update my post and give u all for you to try – Akyna Mar 28 '21 at 15:44
  • Any solution for me ? ^^ – Akyna Mar 29 '21 at 14:35
  • Post a good MRE and you will have more chances of getting a good solution. Read [this](https://stackoverflow.com/help/minimal-reproducible-example); your example is not minimal by any means. SO is not a coding service, so help us help you. – miquelvir Mar 29 '21 at 14:39
  • What i really need is a good exemple of how to use threading system with custom import i didn't find any solution – Akyna Mar 29 '21 at 14:46
  • https://realpython.com/intro-to-python-threading/ – miquelvir Mar 29 '21 at 14:48

1 Answers1

1

Your script is what's called I/O bound. What this means is that it is not slow because the CPU needs to perform long computations, but because it needs to wait a lot every time it requests a URL (the bottleneck are the requests to the internet).

For concurrency you have 3 options:

The first two are the ones which can help you in I/O bound problems like yours. The first one is the recommended approach in a problem like this, since there is a library available with support for async/await.

This is an adapted example from the above link, which does exactly what you need:

import asyncio
import time
import aiohttp


def get_proxies():
    if platform.system() == "Linux":
        clear = lambda: os.system('clear')
        clear()
    if platform.system() == "Windows":
        clear = lambda: os.system('cls')
        clear()
    proxy = set()
    with open("proxy.txt", "r") as f:
        file_lines1 = f.readlines()
        for line1 in file_lines1:
            proxy.add(line1.strip())
    return proxy


async def download_site(session, url, proxies):
    async with session.get(url, proxies=proxies) as response:
        save = response.url
        with open("Yahootemp.txt", mode="a", encoding="utf-8") as cool1:
            cool1.write(save + "\n")


async def download_all_sites(sites, proxies):
    async with aiohttp.ClientSession() as session:
        tasks = []
        for url in sites:
            task = asyncio.ensure_future(download_site(session, url, proxies))
            tasks.append(task)
        await asyncio.gather(*tasks, return_exceptions=True)


if __name__ == "__main__":
    proxies = get_proxies()
    proxies = {
        'http': prox + '://' + random.choice(list(proxies))
    }
    sites = []
    with open("Yahoonodup.txt", mode="r", encoding="utf-8") as cool:
        for url in cool:            
            sites.append(url)
    asyncio.get_event_loop().run_until_complete(download_all_sites(sites, proxies))
    

You could make it even faster if saving the files seems to still be too slow; read this.

miquelvir
  • 1,748
  • 1
  • 7
  • 21
  • Okay, thanks a lot for take time to answer i will take care to you're code and i'll come back if i have any trouble – Akyna Mar 27 '21 at 18:51
  • sure, mark as solved if it solves your issue – miquelvir Mar 27 '21 at 18:53
  • I didn't successful to use my program with you're fix for my code... – Akyna Mar 27 '21 at 20:08
  • what issue have you found? I was not able to run it bc you did not provide a minimum reproducible example – miquelvir Mar 27 '21 at 20:17
  • Issue is it's didn't working, like i have try to just import you're code new source and import other module i need, and i didn't get my response.url in my txt file – Akyna Mar 27 '21 at 20:38
  • SO is not a coding service. Please provide what you have tried and which errors have you run into precisely. – miquelvir Mar 27 '21 at 21:09
  • I know but i just try to use code with create an new python file. I run it and it's seems to not work – Akyna Mar 27 '21 at 21:32
  • I have update my code for you try to better understand what i'm trying to do – Akyna Mar 28 '21 at 11:37