I need to use a non-serialisable 3rd party class in my functions on all executors in Spark, for example:
JavaRDD<String> resRdd = origRdd
.flatMap(new FlatMapFunction<String, String>() {
@Override
public Iterable<String> call(String t) throws Exception {
//A DynamoDB mapper I don't want to initialise every time
DynamoDBMapper mapper = new DynamoDBMapper(new AmazonDynamoDBClient(credentials));
Set<String> userFav = mapper.load(userDataDocument.class, userId).getFav();
return userFav;
}
});
I would like to have a static DynamoDBMapper mapper
which I initialise once for every executor and be able to use it over and over again.
Since it's not a serialisable, I can't initialise it once in the drive and broadcast it.
note: this is an answer here (What is the right way to have a static object on all workers) but it's only for Scala.