I have the following simple code causing an error regarding caching:
trips_in = sc.textFile("trip_data.csv")
trips = trips_in.map(lambda l: l.split(",")).map(lambda x: parseTrip(x)).cache()
trips.count()
The function parseTrip()
gets a list of strings and creates and returns a class Trip:
class Trip:
def __init__(self, id, duration):
self.id = id
self.duration = duration
I get the error right after the action count()
. However, if I remove the cache()
at the end of second line everything work fine.
According to the error the problem is that the class Trip can not be pickled:
PicklingError: Can't pickle __main__.Trip: attribute lookup __main__.Trip failed
So how can I make it picklable (if it is an actual word)? Note that I am using a Databricks notebook so I can not make a separate .py for class definition to make it picklable.