I have a spark python script that has a groupBy in it. In particular, the structure is
import operator
result = sc.textFile(...).map(...).groupBy(...).map(...).reduce(operator.add)
When I run this in an ipython pyspark shell, it works just fine. However, when I try to script it and run it through spark-submit, I get a pickle.PicklingError: Can't pickle builtin <type 'method_descriptor'>
error citing the groupBy as the concern. Is there a known workaround for this?