How can one write to multiple outputs for each key in an RDD using Python and Spark in one job? I know I can try to use .filter for all the possible keys, but this is a lot of work which will create many jobs.
Similar to this question: Write to multiple outputs by key Spark - one Spark job
However the answer to the above question is in scala. Looking for a how-to using Python.
PATH = os.path.join("s3://asdf/hjkl", 'temp_date', "intermediate_data/")
global current_sport
current_sport = ''
def format_for_output(x):
current_sport = x[0]
return json.dumps(x[1])
recommendation2.map(format_for_output).saveAsTextFile(os.path.join(PATH, current_sport))