1

I have written a Frequent Pattern Growth algorithm and figured that after building the full tree which is quite large the counting part which is the most ressource-heavy one could be parallelized.

An FP Tree is a non-binary tree structure where the same name can appear at different nodes. Each node connects to the next node of the same name, so the tree can not only be traversed from the root, but also from the side following a link path that connect all nodes with the same name. From each such node a traversal back to the root node is done while building and counting the possible combinations of that path.

By splitting up the names to follow I figured I could do multiprocessing on the counting part since the algorithm just follows a certain path in the tree structure and does the counting and combining without altering it.

But now I'm hitting a roadblock with an unexpected error from multiprocessing I don't understand. Take the following sample code. It has a class called link which basically just takes a link on another instance of itself. I build two chains of such classes with a given depth, that is each link.link variable contains the next link-instance while the first one is None.

import multiprocessing as mpr

class link:
    def __init__(self,link=None):
        self.link = link

def main(maxRange):
    nproc = 2
    L = []
    for i in range(nproc):
        L.append(link())
        for j in range(maxRange):
            L[i] = link(L[i])
    LL = [L[:1],L[1:]]
    pool = mpr.Pool(processes=nproc)  
    pool.map(test,LL)
    pool.close()
    pool.join()
    return

def test(l):
    pass

if __name__ == "__main__":
    maxRange = 328
    main(maxRange)

Once the value for maxRange reaches 328 I'm getting this error:

  File "E:/PythonDir/Diverses/temp.py", line 896, in main
    pool.map(test,dfl)

  File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\multiprocessing\pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()

  File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\multiprocessing\pool.py", line 567, in get
    raise self._value

RuntimeError: maximum recursion depth exceeded while getting the str of an object

Why does multiprocessing have a problem with objects that contain references to other objects?

Is there a way to get around this?

Why 328?

Khris
  • 3,132
  • 3
  • 34
  • 54
  • Is this the real code that fails? The signature of the stacktrace doesn't match the posted code. Does your `link` class have have a `__str__` or `__repr__` method that uses recursion and may be called somewhere? That is what the error message hints at. – mata Mar 08 '17 at 09:39
  • I haven't posted the full stacktrace, only the last (interesting) part. I already might have found my answer already. – Khris Mar 08 '17 at 09:43
  • 1
    Possible duplicate of [Python: Maximum recursion depth exceeded](http://stackoverflow.com/questions/8177073/python-maximum-recursion-depth-exceeded) – Khris Mar 08 '17 at 09:50

0 Answers0