4

How can I read and update a variable shared between multiple workers in Python?

For example, I'm scanning through a list of files using multiple processes in Python, and would like to check if the parent directory has been scanned or not.

def readFile(filename):
  """ Add the parent folder to the database and process the file
  """

  path_parts = os.path.split(filename)
  dirname = os.path.basename(path_parts[0])
  if dirname not in shared_variable:
    # Insert into the database


   #Other file functions


def main():
  """ Walk through files and pass each file to readFile()
  """
  queue = multiprocessing.Queue()
  pool = multiprocessing.Pool(None, init, [queue])

  for dirpath, dirnames, filenames in os.walk(PATH):

    full_path_fnames = map(lambda fn: os.path.join(dirpath, fn),
                           filenames)
    pool.map(readFile, full_path_fnames)
dano
  • 91,354
  • 19
  • 222
  • 219
ensnare
  • 40,069
  • 64
  • 158
  • 224
  • This is harder to address than you might think. The direct answer is that you can share mutable state with `multiprocessing.Manager`, but you are going to have some *serious* race conditions doing that if you don't implement some sort of mutex locking condition. If at all possible refactor your code such that your workers don't depend on shared state at all. – roippi Jun 18 '14 at 19:01

2 Answers2

2

You can use multiprocessing.Manager to help with this. It allows you to create a list that can be shared between processes:

from functools import partial
import multiprocessing

def readFile(shared_variable, filename):
  """ Add the parent folder to the database and process the file
  """

  path_parts = os.path.split(filename)
  dirname = os.path.basename(path_parts[0])
  if dirname not in shared_variable:
    # Insert into the database


   #Other file functions


def main():
  """ Walk through files and pass each file to readFile()
  """
  manager = multiprocessing.Manager()
  shared_variable = manager.list()
  queue = multiprocessing.Queue()
  pool = multiprocessing.Pool(None, init, [queue])

  func = partial(readFile, shared_variable)
  for dirpath, dirnames, filenames in os.walk(PATH):

    full_path_fnames = map(lambda fn: os.path.join(dirpath, fn),
                           filenames)
    pool.map(func, full_path_fnames)

The partial is just used to make it easier to pass shared_variable to each call of readFile, along with each member of full_path_fnames via map.

dano
  • 91,354
  • 19
  • 222
  • 219
0

Have a look at https://docs.python.org/2/library/multiprocessing.html#sharing-state-between-processes. You can use shared memory using Value or Array to share data between two or more threads.

Christian Berendt
  • 3,416
  • 2
  • 13
  • 22