Best programming way to implement Map Reduce

Question

We have a problem, which is an ideal case for applying MapReduce programming technique. The initial code for this is written in Python. Now we have the following options:

Use Hadoop and Java to implement the MapReduce part.
Use mincemeat and Python to implement the MapReduce part.
Use Hadoop and Python (Hadoop MapReduce Program in Python) to implement the MapReduce part.

I'm not very sure which will be the best option. Can anyone please help ?

Some people are even doubting Hadoop and they are going to suggest Apache Spark. Not really a SO question. — Chiron, Feb 02 '15 at 15:57
Since your initial code is in python and it doesn't make much of a difference whether writing MR in python or Java, (3) should be the best option to pursue for your scenario. — Apurv, Feb 02 '15 at 17:04

score 3 · Accepted Answer · answered Feb 02 '15 at 17:06

3

Since your initial code is in python and it doesn't make much of a difference whether writing MR in python or Java, (3) should be the best option to pursue for your scenario. You might also like to explore libraries like https://github.com/Yelp/mrjob which make it easier to write MR jobs in python.

answered Feb 02 '15 at 17:06

Apurv

4,458
2
21
31

Thanks !! Also after going through http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/ mrjob seems to be the best option – Saurabh Verma Feb 06 '15 at 12:09

Best programming way to implement Map Reduce

1 Answers1