Your main problem is that you use imap
instead of map
. imap
is nonblocking, that means that your main process quits before the processes run through. I'm a bit suprised that it worked sometimes as I think it should have worked never.
That said, there are a few problems with your program:
- the check method, running in different processes, shares one file handler and iterates over the lines. This is just working by chance (it will not work on Windows, for instance) and is bad practice (to put it midly). You should read the file first and then distribute the lines to the processes
- same thing applies with writing to the file. Although appending to a file is safe to do also across processes, a better design would be to put that at the end into the parent process
map
and imap
are thought to run on a list of arguments and then return the result (mapped value)
- leave out the
processes=20
so python can find out the best number of processes based on how many cores your computer has
Based on these things, that's the code I propose:
# program checks twitch accounts in a file.
# writes accounts which aren't taken to another file.
import requests
from multiprocessing import Pool, Queue
base_url = "https://www.twitch.tv/"
def check(line):
twitch_r = requests.get(base_url + line)
if twitch_r.status_code == 404:
return line
def Main():
queue_in = Queue()
queue_out = Queue()
p = Pool()
with open('accounts.txt', 'r') as accounts:
lines = accounts.readlines()
results = p.map(check, lines)
results = [r for r in results if r != None]
with open('valid accounts.txt', 'a') as valid_accounts:
for result in results:
valid_accounts.write(result)
if __name__ == "__main__":
Main()
The only thing to be noted is that you need to strip out the None
in results
because check(line)
returns None
for all the urls which result is not a 404
.
Updates:
After using John's solution, the program is working as intended
I doubt it does. Since you are on windows, every process has it's own filehandler pointing to accounts.txt
and will cycle through all the lines. So you end up checking every url 20 times and the multiprocessing didn't help you
I used imap because I read that imap doesn't return a list (?)
No. The difference of map vs. imap in this situation is only that map waits until all processes are done (thus, you don't need to call join
).
For a more thorough discussion about map vs imap see here