1

This is a problem that you guys might have faced before. I'm trying to process more than one text file that hold hashes which are separated by \r\n (CRLR) symbols. Once one of the processes have compared and found the hash, I want the rest of the processes to exit by breaking out of their loop. There's no reason for them to continue reading the other files when I already got my results.

import os.path
from multiprocessing import Pool
import time
import os
import hashlib

def process_hash_file(password, path):
    ''' Process one file: read each line and search for a given hash '''
    m = hashlib.sha1()
    m.update(password)
    password_sha1 = m.hexdigest().upper()
    print("SHA1: " + password_sha1)
    isFound = False
    hash_sha1 = ""
    times_found = ""
    start_time = time.time()
    with open(path) as f_hashes:
        for hash in f_hashes:
            hash_sha1 = hash.split(':')[0]
            times_found = hash.split(':')[1]
            print('[D] Checking ' + hash_sha1 + " : " + times_found[:len(times_found)-1] + " against " + password_sha1)
            if hash_sha1 == password_sha1:
                isFound = True
                print(hash_sha1 + " matches password!")
                break
    if isFound:
        print(str(password) + "(" + password_sha1 + ") match found this many times: " + times_found)
        print("process took: " + str(time.time() - start_time) + " seconds to finish!")
    else:
        print("No match was found for: " + password + "(" + password_sha1 + ")")
        print("process took: " + str(time.time() - start_time) + " seconds to finish!")

Now, my problem is that I find no way of signalling the other processes to stop.

I tried to create a variable with a lock attached to it (a very naive approach), to try and stop the other processes, but for some reason this fails. Now, I'm aware there are "infrastructures" in python that facilitate similar behavior, I just couldn't find the right one or maybe I just don't know how to use them correctly to achieve my goal here.

import multiprocessing
import time
import os

mylock = multiprocessing.Lock()
trigger_stop = False


def continue_until_triggered():
    ''' Count slowly towards a large number '''
    print('process id:', os.getpid())
    for num in range(0, 999999):
        time.sleep(1)
        """Wait for lock to release"""
        with mylock:
            if trigger_stop:
                print("trigger was hit, stopping!")
                break


def trigger_after_time_passed(time_passed):
    ''' Makes continue_until_triggered stop by triggering stop'''
    print('process id:', os.getpid())
    time.sleep(time_passed)
    """Wait for lock to release"""
    with mylock:
        trigger_stop = True


if __name__ == '__main__':
    print("starting processes...")
    print('parent process:', os.getppid())
    m1 = multiprocessing.Process(name='continue_until_triggered',
                                 target=continue_until_triggered)
    m1.start()

    m2 = multiprocessing.Process(name='trigger_after_time_passed',
                                 target=trigger_after_time_passed,
                                 args=(5,))
    m2.start()
    print("done processing!")
Outputs:
starting processes...
parent process: 3500
done processing!
process id: 6540
process id: 3736
[trigger_stop is never set to True, therefore the process doesn't stop or I might be dead locking here]

What I want is a result like this:

Output: 
starting processes...
parent process: 3500
done processing!
process id: 6540
process id: 3736
[trigger_stop is set to True]
trigger was hit, stopping!
[3736 exits]
[6540 exits]
Disane
  • 25
  • 4

1 Answers1

2

Normal variables are not shared between the processes. Every process gets its own copy of the variables, you need something that supports shared state like Event:

https://repl.it/@zlim00/signaling-processes-to-stop-if-another-concurrent-process-ha

import multiprocessing
import time
import os

def continue_until_triggered(mylock, trigger_stop):
    ''' Count slowly towards a large number '''
    print('process id:', os.getpid())
    for num in range(0, 999999):
        time.sleep(1)
        """Wait for lock to release"""
        with mylock:
            if trigger_stop.is_set():
                print("trigger was hit, stopping!")
                break


def trigger_after_time_passed(time_passed, mylock, trigger_stop):
    ''' Makes continue_until_triggered stop by triggering stop'''
    print('process id:', os.getpid())
    time.sleep(time_passed)
    """Wait for lock to release"""
    with mylock:
        trigger_stop.set()


if __name__ == '__main__':
    print("starting processes...")
    print('parent process:', os.getppid())

    mylock = multiprocessing.Lock()
    trigger_stop = multiprocessing.Event()
    m1 = multiprocessing.Process(name='continue_until_triggered',
                                 target=continue_until_triggered,
                                 args=(mylock, trigger_stop))
    m1.start()

    m2 = multiprocessing.Process(name='trigger_after_time_passed',
                                 target=trigger_after_time_passed,
                                 args=(5, mylock, trigger_stop))
    m2.start()
    print("done processing!")

Output:

starting processes...
parent process: 58648
done processing!
process id: 62491
process id: 62492
trigger was hit, stopping!
SimonF
  • 1,855
  • 10
  • 23
  • Hey SimonF, thanks for the answer, I kind of realized this must be the case, I just didn't know what to use. Anyways, unfortunately your solution didn't work for me. I tried it on Python 3.5.1 x86 as well as Python 3.6 x64, but the script got stuck at: ```starting processes... parent process: 3500 done processing! process id: 8144 process id: 4316``` – Disane Jan 21 '19 at 15:05
  • I can't help you with your environment, it works for me and here: https://repl.it/@zlim00/signaling-processes-to-stop-if-another-concurrent-process-ha – SimonF Jan 21 '19 at 15:10
  • I understand, I tested the script under my Ubuntu 16 VM and it worked! It appears under windows multiprocessing might be bugged on the Python versions that I tried. I should probably update and try again. – Disane Jan 21 '19 at 15:11
  • Just tried it on Python 3.7.2 x86, Windows 8, nope, it won't budge. I guess I'll have to run the script in my Linux environment. – Disane Jan 21 '19 at 15:20
  • @Disane I updated the code, try it now. This is the cause of the difference: https://stackoverflow.com/questions/38236211/why-multiprocessing-process-behave-differently-on-windows-and-linux-for-global-o – SimonF Jan 21 '19 at 15:24
  • Your updated code worked! Thank you for the explanation and the links! They are extremely helpful! – Disane Jan 21 '19 at 15:51