0

I'm trying to run multiple functions simultaneously:

-or so called functions because they belong to a class:

from sh import tail
data = {}
class my_Class():
    def __init__(self):
        """Nothing to declare for initializing"""
    def get_data(self, filepath):
        """I'm trying to import the data from several files"""
        for line in tail("-f", "-n 1", filepath, _iter=True):
            data[filepath] = line
            print(data)
my_Class().get_data("path/to/file") #call 1
my_Class().get_data("path/to/another/file") #call 2
# ... 14 similar calls

I want each call to append it's data to the dictionary. And so, when I call:

my_Class().get_data("path/to/file") #call 1
my_Class().get_data("path/to/another/file") #call 2
# ... 14 similar calls

The result should print:

#1 {'filepath1' : line}
#2 {'filepath1' : line, 'filepath2' : line}
#3 {'filepath1' : line, 'filepath2' : line, 'filepath3' : line}
# ... 13 more

At the same time I want the content of dictionary data{...} to keep changing dynamically; because of the data in the files is flexible. For example:

#1 {'filepath1' : line}
#2 {'filepath1' : last_line_in_the_file}, 'filepath2' : line}
#3 {'filepath1' : last_line_in_the_file, 'filepath2' : last_line_in_the_file, 'filepath3' : line}
# ... 13 more

I've already checked these posts: but it doesn't do what I ask; Python: How can I run python functions in parallel?, How to do parallel programming in Python Thank you! Please tell me if something sounds obscure

Atizva
  • 3
  • 3
  • If the functions are running in parallel, at the same time, what does "inherit from the previous function" mean? What's the previous function? And how can you inherit its data when it hasn't finished producing that data yet? – abarnert Jul 10 '18 at 22:44
  • @abarnert Inherit from the pervious function; means that in the previous call starting from #2. i.e: call #2 will inherit data from #1 call. And so #2 will print `{**'filepath1' : line**, 'filepath2' : line}. The character marked in bold are inherited in the #2 call from #1 call. And so on, for 14 calls. As for how can I inherite if the function have not finished **processing**. The function only reads the data from a file, and tail make sure it gets the most recent data. – Atizva Jul 10 '18 at 23:09

1 Answers1

0

It sounds like you're asking for two things here:

  1. How to run tasks in parallel, and
  2. How to share (mutable) values between those tasks.

For the first one, the answer is, as you suspected, threads. For some programs, that isn't the answer, because they're spending most of their time doing heavy CPU computation in Python, or because you need thousands of tasks rather than a handful. But here, just run each one as a thread.

Instead of this:

my_Class().get_data("path/to/file") #call 1
my_Class().get_data("path/to/another/file") #call 2

… you create your threads:

t1 = threading.Thread(target=my_Class().get_data, args=

("path/to/file",)) t2 = threading.Thread(target=my_Class().get_data, args= ("path/to/another/file",))

… then start them:

t1.start()
t2.start()

… then wait for them all to finish (which, in this case, will obviously take forever, so you could simplify things here…):

t1.join()
t2.join()

Now, how do you share mutable data between threads?

To start with, you can just access and mutate the same values from the different threads. But in general, you'll want to put a lock around each value, unless you know that you don't need one.

If you only care about CPython on Windows, macOS, Linux, and BSD, inserting a string value with a string key into a dict object is one of those things that doesn't need a lock. And printing to stdout is another one. And those are the only things you're sharing. So, you actually don't need any locks here; things will just work.

But, since you probably didn't know that dicts were safe in this way, let's see how you'd use a lock.

data = {}
data_lock = threading.Lock()

# etc.

def get_data(self, filepath):
    """I'm trying to import the data from several files"""
    for line in tail("-f", "-n 1", filepath, _iter=True):
        with data_lock:
            data[filepath] = line
            print(data)

That's all there is to it.

Things can get a bit more complicated. For example, you don't really need to hold the lock for as long as you're doing here. If you had 30 threads, there'd be a good chance that one of them is trying to grab the lock to add a new value, while another thread had already made a string out of data, and was taking its time printing that string (printing to stdout is pretty slow), but hadn't released the lock yet. If so, you could get a bit more parallelism by breaking things down:

def get_data(self, filepath):
    """I'm trying to import the data from several files"""
    for line in tail("-f", "-n 1", filepath, _iter=True):
        with data_lock:
            data[filepath] = line
        with data_lock:
            datastr = str(data)
        print(datastr)

But that's really as complicated as it gets. The hard part about threading is when you have to compose separate locks because you have separate data that's getting passed around between threads and so on. For simple cases like this, it's actually pretty simple.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Exactly what I was trying to do, thank you very much <3 !! I needed the 'Threading', but didn't know how to use it in code. As for the lock thing; Good to know – Atizva Jul 11 '18 at 01:26
  • @Atizva The docs for the [`threading`](https://docs.python.org/3/library/threading.html) module are a great reference, but as a tutorial… they kind of assume you already know Java threading and pthreads in C and just want to know how to do the same thing in Python, which probably isn't that helpful to you. I'm sure there are good third-party tutorials out there that can help, but I don't know any in particular. – abarnert Jul 11 '18 at 02:15