0

I have looked at this question to get started and it works just fine How can I recover the return value of a function passed to multiprocessing.Process?

But in my case I would like to write a small tool, that would connect to many computers and gather some statistics, each stat would be gathered within a Process to make it snappy. But as soon as I try to wrap up the multiprocessing command in a class for a machine then it fails.

Here is my code

import multiprocessing 
import pprint


def run_task(command):
    p = subprocess.Popen(command, stdout = subprocess.PIPE, universal_newlines = True, shell = False)
    result = p.communicate()[0]
    return result


MACHINE_NAME = "cptr_name"
A_STAT = "some_stats_A"
B_STAT = "some_stats_B"

class MachineStatsGatherer():
    def __init__(self, machineName):
        self.machineName = machineName
        manager = multiprocessing.Manager() 
        self.localStats = manager.dict() # creating a shared ressource for the sub processes to use
        self.localStats[MACHINE_NAME] = machineName

    def gatherStats(self):
        self.runInParallel(
            self.GatherSomeStatsA,
            self.GatherSomeStatsB,
            )
        self.printStats()

    def printStats(self):
        pprint.pprint(self.localStats)

    def runInParallel(self, *fns):
        processes = []
        for fn in fns:
            process = multiprocessing.Process(target=fn, args=(self.localStats))
            processes.append(process)
            process.start()
        for process in processes:
            process.join()

    def GatherSomeStatsA(self, returnStats):
        # do some remote command, simplified here for the sake of debugging
        result = "Windows"
        returnStats[A_STAT] = result.find("Windows") != -1
 
    def GatherSomeStatsB(self, returnStats):
        # do some remote command, simplified here for the sake of debugging
        result = "Windows"
        returnStats[B_STAT] = result.find("Windows") != -1
 

def main():
    machine = MachineStatsGatherer("SOMEMACHINENAME")
    machine.gatherStats()
    return

if __name__ == '__main__':
    main()

And here is the error message

Traceback (most recent call last):
  File "C:\Users\mesirard\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "C:\Users\mesirard\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "d:\workdir\trunks6\Tools\VTKAppTester\Utils\NXMachineMonitorShared.py", line 45, in GatherSomeStatsA
    returnStats[A_STAT] = result.find("Windows") != -1
TypeError: 'str' object does not support item assignment
Process Process-3:
Traceback (most recent call last):
  File "C:\Users\mesirard\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "C:\Users\mesirard\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "d:\workdir\trunks6\Tools\VTKAppTester\Utils\NXMachineMonitorShared.py", line 50, in GatherSomeStatsB
    returnStats[B_STAT] = result.find("Windows") != -1
TypeError: 'str' object does not support item assignment
Franck Mesirard
  • 3,169
  • 3
  • 20
  • 17
  • 2
    Why use multiprocessing for this job at all? Connecting to computers and gathering statistics is network-bottlenecked, not CPU-bottlenecked, so your costs of serializing/deserializing data to pass it across process boundaries is needless waste. This is a job for threading, not multiprocessing. – Charles Duffy Oct 04 '21 at 11:59
  • 1
    (That said: The error message tells you explicitly what the immediate problem: Your code assumes the argument passed to `GatherSomeStatsA` is a mutable dict, and it's a string instead. However, fixing that to pass a dict in that position is not a good idea, because the property of dicts that a change to one copy changes all other copies _does not hold across process boundaries_ -- when an object is copied to a subprocess the subprocess's copy is independent of the parent's and changes are not propagated back -- so the general approach being attempted is fatally flawed) – Charles Duffy Oct 04 '21 at 12:01
  • @CharlesDuffy thanks for your answers. 1) I am using a dict created by multiprocessing.Manager(), I though that this would make it safe 2) Why does the code think it is receiving a string when I am passing the dictionary in the args of the process – Franck Mesirard Oct 05 '21 at 09:24
  • 1
    I can answer point 2 and It works now, in line "process = multiprocessing.Process(target=fn, args=(self.localStats))", I did not add a comma at the end of the args list. It should have been process = multiprocessing.Process(target=fn, args=(self.localStats,)) – Franck Mesirard Oct 05 '21 at 09:33

1 Answers1

0

The issue is coming from this line

process = multiprocessing.Process(target=fn, args=(self.localStats))

it should have a extra comma at the end of args like so

process = multiprocessing.Process(target=fn, args=(self.localStats,))
Franck Mesirard
  • 3,169
  • 3
  • 20
  • 17