0

I am working with PySpark using mixin factory of two classes

But each time map function is called the kernel just break, tried to debug and find relevant solution but didn't find any support .

At the moment I have multiple classes, those classes are called as according to the needs Interestingly, this format worked on previous version of Spark (1.6) but don't work with latest Spark 2.0 later

I believe its due to conflicting similar meta-class name among workers...

so, correction or some reference would be deeply appreciated

def mixin_factory(name, base, mixin):
    class _tmp(base, mixin):
        pass
    _tmp.__name__ = name
    return _tmp

def Mix_map_function(dataframe) :
    MixClass = mixin_factory("MixClass", Class_A, Class_B)
    MixClass( .... , dataframe ) # class initialization parameters using constructor

PiplinedRDD.map(lambda x: Mix_map_function(x[0]) , preservesPartitioning=True )  
# x[0] some partitioned data from rdd
zero323
  • 322,348
  • 103
  • 959
  • 935
  • 1
    This is just a guess, but since your class is generated at runtime it probably only exists on the master node. What happens if you simply declare MixClass "normally" ? – Hitobat Apr 30 '18 at 16:23
  • yes Indeed I tried, and everything just work perfectly fine If dont use pyspark and try to run the program normally – Zafar Mahmood Apr 30 '18 at 16:28
  • Possible duplicate of [How to use custom classes with Apache Spark (pyspark)?](https://stackoverflow.com/q/31093179) – zero323 Apr 30 '18 at 16:30

0 Answers0