0

I am working in Spark in windows. I have successfully set up spark and environmental variables in windows and my programs run in Scala IDE with no issues. Now I need to use Mahout library functions for machine learning. I tried to use this link to make Mahout work for windows here, but I am without luck, it is not working. My scala ide says:"Unable to read output from "mahout -spark classpath". Is SPARK_HOME set?"

Does anyone know how to set Mahout for windows properly? Thanks in advance.

user3086871
  • 671
  • 3
  • 7
  • 25
  • Please read [Why is “Can someone help me?” not an actual question?](https://meta.stackoverflow.com/questions/284236/why-is-can-someone-help-me-not-an-actual-question) before attempting to ask more questions. –  Jul 18 '17 at 15:58
  • Please read [How do I ask a good question?](http://stackoverflow.com/help/how-to-ask) before attempting to ask more questions. –  Jul 18 '17 at 15:58

2 Answers2

0

We at the Mahout project do not support Windows directly. VMs are free now so I'd suggest installing one for most of the JVM (Java virtual machine) tools from Apache. Some will work on Window natively but they all work on Linux. Then install the `Nix that you may use in production. This has several benefits.

Alternatively edge Windows has a new Linux subsystem PowerShell that allows installation of a guest OS like Ubuntu. This would be an experiment since I haven't tried it. https://msdn.microsoft.com/en-us/commandline/wsl/install_guide

Not sure if this is using Container or VM tech but sounds promising.

pferrel
  • 5,673
  • 5
  • 30
  • 41
-1

That link is overkill.

If you're trying to run Mahout on Spark in a REPL environment, all you should need to do is set some env variables.

Have you set SPARK_HOME? (try echo $SPARK_HOME - i think that works on windows?)

The other approach would be to use Apache Zeppelin, which imho is a much nicer experience, to work with. Tutorial

I haven't heard of anyone doing Mahout on Windows, but it should be straight forward. If / when you get it working- please write a tutorial and we'll post it on the website (I'm a community member), we can help you out, pls reach out on the developer email list

Update

If you're having trouble running bin/mahout you can either install Cygwin (thus creating a Unix like environment, OR you can try the following:

export MAHOUT_JARS=$(echo "$MAHOUT_HOME"/*.jar | tr ' ' ',')

$SPARK_HOME/bin/spark-shell --jars "$MAHOUT_JARS" \
    -i $MAHOUT_HOME/bin/load-shell.scala \ 
    --conf spark.kryo.referenceTracking=false \
    --conf spark.kryo.registrator=org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator \
    --conf spark.kryoserializer.buffer=32k \
    --conf spark.kryoserializer.buffer.max=600m \
    --conf spark.serializer=org.apache.spark.serializer.KryoSerializer

Which should start the spark-shell with the Mahout Jars/proper spark config, and mahout startup script (which imports libraries and sets up the Mahout distributed context)- but personally, I'd recommend Zeppelin (see tutorial link above).

rawkintrevo
  • 659
  • 5
  • 16
  • I am afraid, it is not straightforward because of the commands/scripts are bash scripts and only work on Linux. That link gave somewhat similar scripts for windows but they are outdated now. Need someone to update the script or another easier way to use Mahout. Btw I have successfully set env variables in windows. – user3086871 Jul 17 '17 at 13:58