I am using Amazon EMR and because of the way it works (parallel) my output gets split in multiple files.
But i would like to have one file instead with the right sequence, is it possible to do just that?
my last lines in reducer are like this
for key, value in doc_dict.iteritems():
print key
for k, v in value.iteritems():
print k,v
this is driving me crazy, i cant present results as they are mixed up.