0

I am currently writing a script where I want to store the headers and cookies whenever there is a detection of the website where I require to solve my very own created solution to login. Once the login is solved, I want to store the session into with the proxy. This is what I have written

from threading import Thread

import test_by_me

# Save the session to a dict to be re-used
saved_session: dict = {}


def injection(session, response):
    if test_by_me.challenge(session, response):
        solve = test_by_me.solve(session, response)

        # Save the session to a dict to be re-used
        saved_session[session.proxies["https"]] = {
            "headers": session.headers,
            "cookies": session.cookies
        }

        # Return the response
        return solve
    else:
        return response


def create_session():
    return test_by_me.create_scraper(
        Hook=injection,
    )


# ------------------------------------------------------------------------------- #
# Each thread runs this functions independently
def from_page(url):
    while True:
        with create_session() as session:
            proxy = proxies.random_proxies()  # Returns a single rando proxy format

            session.proxies = {
                'https': proxy.random_proxies()
            }

            if proxy in saved_session:
                session.headers = saved_session[proxy]['headers']
                session.cookies = saved_session[proxy]['cookies']

            # Make sure to have GET inside proxies context manager
            resp = session.get(url, timeout=6)
            ...

main:

def main() -> None:
    db_urls = [...]

    # Start threads for each url in the difference
    for url in db_urls:
        Thread(
            target=from_page(url, )
        ).start()


if __name__ == '__main__':
    main()

However my problem is that I cannot figure out how I can share the variable (saved_session) to all the threads that are being alive so I can re-use the same session if it generates the same proxy in the line proxy = proxies.random_proxies() -> if its e.g. 'http://192.168.1.1:1841' and if we have that proxy stored into the saved_session, then I want to re-use the session. how can I do that?

PythonNewbie
  • 1,031
  • 1
  • 15
  • 33
  • Does this answer your question? [Python creating a shared variable between threads](https://stackoverflow.com/questions/17774768/python-creating-a-shared-variable-between-threads) – decadenza Aug 23 '22 at 11:05
  • Hmm not quite @decadenza - Since I believe my issue could hgappend due to I will add/update the dict, maybe there could be an issue regarding concurrent? – PythonNewbie Aug 23 '22 at 11:07
  • Since save_session is a global mutable object, could you use the method from [How to Share Data explicitly between Threads vs. Processes in Python](https://medium.com/analytics-vidhya/python-tips-multithreading-vs-multiprocessing-data-sharing-tutorial-52743ed48825)? Basically, "When you create a mutable object such as a list or dictionary in the global scope, it shares the same memory address anytime you use it as a thread argument, which is called “Pointer” in lower-level languages like C or C++. " – DarrylG Aug 23 '22 at 13:44
  • @DarrylG I see, I do understand but I cannot see how that would possible work with my code that I have provided really, do you perhaps have time to try use my example and see how you meant? I understand that you want to insert the variable (the mutable object) into a args? – PythonNewbie Aug 23 '22 at 15:15
  • @ProtractorNewbie I can't run your example since it's missing some code (e.g. module test_by_me and urls for db_urls. Is the issue having save_session as an argument for function from_page? This was just an idea that may not work. – DarrylG Aug 23 '22 at 15:26
  • Dont think its an issue but will it be shareable for every thread if I send as argument though? @DarrylG – PythonNewbie Aug 23 '22 at 15:29
  • I haven't tried this myself, so was just going by the article I referenced. – DarrylG Aug 23 '22 at 15:31

1 Answers1

1

Why not injecting the session object to each target thread function and using a threading.Lock to change shared variables state?

Maybe you don't need to reuse the same session... still I would advise against spawning sessions per-threads, it becomes tricky and inefficient.

import threading 

saved_session: dict = {}

lock = threading.Lock()

def from_page(url, session):
    while True:
        proxy = proxies.random_proxies()  # Returns a single rando proxy format
        
        # using lock to change shared variables
        with lock:
            session.proxies = {
                'https': proxy.random_proxies()
            }

            if proxy in saved_session:
                session.headers = saved_session[proxy]['headers']
                session.cookies = saved_session[proxy]['cookies']

        # Make sure to have GET inside proxies context manager
        resp = session.get(url, timeout=6)
        ...

def main() -> None:
    db_urls = [...]
    with create_session() as session:
        # Start threads for each url in the difference
        for url in db_urls:
            Thread(
                target=from_page, args=(url,session, )
            ).start()


SystemSigma_
  • 1,059
  • 4
  • 18