Can apache mahout ALS work without hadoop?

Question

I tried using ParallelALSFactorizationJob, but it crashes here:

Exception in thread "main" java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)

Command line help mentions using filesystem, but it seems it wants hadoop. How can I run it on Windows, mahout.cmd file is broken:

"===============DEPRECATION WARNING===============" "This script is no longer supported for new drivers as of Mahout 0.10.0" "Mahout's bash script is supported and if someone wants to contribute a fix for this" "it would be appreciated."

So is that possible (ALS + Windows - hadoop)?

you can import Mahout jars (mahout-core, mahout-math etc.) into your Java app and run it locally. there is a nice tutorial called "Mahout in action" which describes the usage of various Mahout classes — mangusta, Oct 23 '18 at 05:01

score 0 · Answer 1 · answered Nov 01 '18 at 21:26

Mahout is a community-driven project and its community is very strong.

"Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed processing platforms such as Hadoop and Spark."

-Tiwary, C. (2015). Learning Apache Mahout.

Apache Spark is an open-source, in-memory, general purpose computing system that runs on both Windows and Unix like systems. Instead of Hadoop-like disk-based computation, Spark uses cluster memory to upload all the data into the memory, and this data can be queried repeatedly.

"As Spark is gaining popularity among data scientists, the Mahout community is also quickly working on making Mahout algorithms function on Spark's execution engine to speed up its calculation 10 to 100 times faster. Mahout provides several important building blocks to create recommendations using Spark."

-Gupta, A (2015). Learning Apache Mahout Classification.

(This last book also provides a step by step guide Using Mahout's Spark shell (they don't use Windows and it isn't clear if they use Hadoop or not though). For more information on that topic, see the implementation section at https://mahout.apache.org/users/sparkbindings/play-with-shell.html.)

In addition to this, you can build recommendation engines using Spark such as DataFrames, RDD, Pipelines, and Transforms available in Spark MLlib and

in Spark, (...) the Alternating Least Squares (ALS) method is used for generating model-based collaborative filtering.

-Gorakala, S. (2016). Building Recommendation Engines.

At this point, there's one question still to answer before answering your question: can we run Spark without Hadoop?.

So, yes, it's possible to use ALS method on Windows using Spark (without Hadoop).

Can apache mahout ALS work without hadoop?

1 Answers1