-1

I've been running multiple threads (by "symbol" below) but have encountered a weird issue where there appears to be a potential memory leak depending on which gets processed first. I believe the issue is due to me using the same field name / array name in each thread.

Below is an example of the code I am running to assign values to an array:

for i in range(level+1):
    accounting_price[i] = yahoo_prices[j]['accg'][i][0]

It works fine but when I query multiple "symbols" and run a thread for each symbol, I am sometimes getting symbol A's "accounting_price[i]" being returned in Symbol C's and vice versa. Not sure if this could be a memory leak from one thread to the other, but the only quick solution I have is to make the "account_price[i]" unique to each symbol. Would it be correct if I implement the below?

symbol = "AAPL"
d = {}
for i in range(level+1):
    d['accounting_price_{}'.format(symbol)][i] = yahoo_prices[j]['accg'][i][0]

When I run it, I get an error thrown up.

I would be extremely grateful for a solution on how to dynamically create unique arrays to each thread. Alternatively, a solution to the "memory leak".

Thanks!

Simon
  • 21
  • 2
  • 1
    Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described. – Prune Mar 30 '18 at 16:14
  • 2
    I don't see how there could be a memory leak here. But a race condition between two threads that gives incorrect data, sure, that's always possible when you mutate shared objects without a lock. The fix is to acquire a lock around every access to the shared objects—or to redesign your app so that they don't mutating the same dict (e.g., maybe each builds its own dict, and you merge them after all the threads are done). – abarnert Mar 30 '18 at 16:15
  • 2
    The simplest way to create a different list or dict for each thread is to make it a local variable instead of a global, that gets created in each thread after they've started. But if your existing design is too deeply embedded to fix, you can use thread-local storage (see the `threading` docs), or an immutable dict mapping, say, thread IDs to separate copies of the thing you're currently sharing. – abarnert Mar 30 '18 at 16:16
  • Is your program running under a `if __name__ == "__main__":` statement as [shown here](https://stackoverflow.com/a/48552764/3491991)? Not doing that has caused me problems like the one you're experiencing – zelusp Mar 30 '18 at 16:20
  • Thanks all. As @abarnert has mentioned, my suspicion is that it's a race condition. My concern is that if I create a lock, would that not mean that if one thread crashes or takes an absurd amount to get a response for the locked variable, that all other threads will remain halted? Ideally I wouldn't want this and I would want the other threads to continue. My other workaround was to create an array with an ID for each thread. This is effectively adding an nth dimension to the arrays, but it also means that the variables / declared arrays are unique to each thread – Simon Mar 30 '18 at 16:40
  • 1
    @zelusp That’s only an issue with multiprocessing (and only if you use spawn or forkserver rather than fork), not with threading. – abarnert Mar 30 '18 at 16:42
  • 1
    @Simon If you always acquire the lock in a `with` statement instead of manual acquire/release pairs, “crashing” cannot abandon the lock. An exception will release the lock, while an actual crash (segfault) is going to kill the whole program unless you trap it and then it into an exception to just kill the thread, in which case you’re back to the first case. – abarnert Mar 30 '18 at 16:44
  • 1
    @Simon As for a thread taking an absurd amount of time inside the lock—that _can_ happen, and you have to design around it, but in many cases that’s very simple. First you do the expensive work (loading and parsing the page) without a lock, and you acquire the lock briefly, just to write the result into the dict or list at the end (and maybe once or twice along the way to read from it). – abarnert Mar 30 '18 at 16:47

2 Answers2

0

If you think there’s a race here causing conflicting writes to the dict, using a lock is both the best way to rule that out, and probably the best solution if you’re right.

I should point out that in general, thanks to the global interpreter lock, simple assignments to dict and list members are already thread-safe. But it’s not always easy to prove that your case is one of the “in general” ones.

Anyway, if you have a mutable global object that’s being shared, you need to have a global lock that’s shared along with it, and you need to acquire that lock around every access (read and write) to the object.

If at all possible, you should do this using a with statement to ensure that it’s impossible to abandon the lock (which could cause other threads to block forever waiting for the same lock).

It’s also important to make sure you don’t do any expensive work, like downloading and parsing a web page, with the lock acquired (which could cause all of your threads to end up serialized instead of usefully running in parallel).

So, at the global level, where you create accounting_info, create a corresponding lock:

accounting_info = […etc.…]
accounting_info_lock = threading.Lock()

Then, inside the thread, wherever you use it:

with accounting_price_lock:
    setup_info = accounting_price[...]

yahoo_prices = do_expensive_stuff(setup_info)

with accounting_price_lock:
    for i in range(level+1):
        accounting_price[i] = yahoo_prices[j]['accg'][i][0]

If you end up often having lots of reads and few writes, this can cause excessive and unnecessary contention, but you can fix that by just replacing the generic lock with a read-write lock. They’re a bit slower in general, but a lot faster if a bunch of threads want to read in parallel.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Thanks for the write-up! I tried implementing the way you have it above but still keep getting the same issue. Just a note that I have the same function i.e., def thisfunction(symbol). I create n threads of def thisfunction() based on n different symbol values. The code above sits in thisfunction() but it doesn't appear to be making much of a difference. Would a lock work on one function from which n threads are created? – Simon Mar 30 '18 at 23:56
  • My mistake! I implemented more locks to some more variables (which aren't expensive) and it works! :) thanks – Simon Mar 31 '18 at 00:18
-2

The error is presumably a KeyError, right? It's because you're indexing two levels into your dictionary when only one exists. Try this:

symbol = "AAPL"
d = {}
for i in range(level+1):
  name = 'accounting_price_{}'.format(symbol)
  d[name] = {}
  d[name][i] = yahoo_prices[j]['accg'][i][0]
Personman
  • 2,324
  • 1
  • 16
  • 27