Java based Mahout's goal is to build scalable machine learning libraries. Are there any equivalent libraries in Python ?

- 3,998
- 3
- 29
- 32

- 71,928
- 54
- 216
- 264
-
1You could use Jython or JPype to intergrate Mahout with your Python code. See my simular question: http://stackoverflow.com/questions/7491953/is-there-any-python-libraries-for-mahout – Matt Alcock Oct 14 '11 at 13:39
-
Python is not considered a good choice for large dataset computations since the performance gets prohibitively slow. – Swapnil Dec 28 '12 at 15:15
5 Answers
scikits learn is highly recommended http://scikit-learn.sourceforge.net/

- 8,580
- 4
- 34
- 32
-
5Just a note: the current implementation of scikit-learn its not yet able to leverage a Hadoop cluster to do distributed computing. It is however fairly scalable to address medium sized problems (e.g. hundreds of thousands of samples and features for linear models), esp. if you use sparse representations and / or memmap'ed arrays. – ogrisel May 31 '11 at 12:03
Spark MLlib is recommmended. It is a scalable machine learning lib, can read data from HDFS and of course runs on top of Spark.
You can access it via PySpark (see the Programming Guide's Python examples).
Orange is supposedly pretty decent, from what I've heard, but I've never used it personally. PyML might be worth taking a look at as well. Also, Monte.

- 4,849
- 3
- 26
- 23
-
1Orange isn't even close to being scalable. Nearly all of its algorithms are slow batch processes, and they have no intention of making them otherwise due to the academic orientation of the project. Sadly, there really isn't any Python equivalent of Mahout. – Cerin Jan 28 '11 at 13:07
-
3@Chris: the scikit-learn is probably not there yet, but it has the goal to be scalable and avoid the pitfalls of academic-oriented projects. Some of our implementations for standard problems scale already quite well. – Gael Varoquaux Jan 30 '11 at 11:48
pysuggest is a Python wrapper for SUGGEST, a Top-N recommendation engine that implements a variety of recommendation algorithms for collaborative filtering.

- 58,567
- 58
- 222
- 373
An interesting library is crab.
As of this post, the library only has stable implementations for collaborative filtering algorithms: user-based and item-based.
An SVD implementation is included but it's experimental and content-based algorithms are on the roadmap.
Do check it out!

- 2,364
- 2
- 27
- 38