1

We have a problem, which is an ideal case for applying MapReduce programming technique. The initial code for this is written in Python. Now we have the following options:

  1. Use Hadoop and Java to implement the MapReduce part.
  2. Use mincemeat and Python to implement the MapReduce part.
  3. Use Hadoop and Python (Hadoop MapReduce Program in Python) to implement the MapReduce part.

I'm not very sure which will be the best option. Can anyone please help ?

Saurabh Verma
  • 6,328
  • 12
  • 52
  • 84
  • Some people are even doubting Hadoop and they are going to suggest Apache Spark. Not really a SO question. – Chiron Feb 02 '15 at 15:57
  • Since your initial code is in python and it doesn't make much of a difference whether writing MR in python or Java, (3) should be the best option to pursue for your scenario. – Apurv Feb 02 '15 at 17:04

1 Answers1

3

Since your initial code is in python and it doesn't make much of a difference whether writing MR in python or Java, (3) should be the best option to pursue for your scenario. You might also like to explore libraries like https://github.com/Yelp/mrjob which make it easier to write MR jobs in python.

Apurv
  • 4,458
  • 2
  • 21
  • 31
  • Thanks !! Also after going through http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/ mrjob seems to be the best option – Saurabh Verma Feb 06 '15 at 12:09