I have an rdd which contains key value pairs. There are just 3 keys, and I would like to write all the elements for a given key to a textfile. Currently I am doing this in 3 passes, but I wanted to see if I could do it in one pass.
Here is what I have so far:
# I have an rdd (called my_rdd) such that a record is a key value pair, e.g.:
# ('data_set_1','value1,value2,value3,...,value100')
my_rdd.cache()
my_keys = ['data_set_1','data_set_2','data_set_3']
for key in my_keys:
my_rdd.filter(lambda l: l[0] == key).map(lambda l: l[1]).saveAsTextFile(my_path+'/'+key)
this works, however caching it and iterating through three times can be a lengthy process. I am wondering if there is any way to simultaneously write all three files?