I don't want to run spark in a cluster. The only reason for using spark is to make use of the MLlib. In a nutshell, I need to use MLlib jar in my application with bare minimal dependencies. Currently, my spark assembly jar is around 125 MB. Is there any way to minimize it?
Asked
Active
Viewed 102 times
1
-
Can you describe your setup ? What build tools are you using ? The description in your question is minimalist and it can lead to the question to be closed as broad. – eliasah Jun 27 '18 at 11:49
-
My setup is tomcat web application. It serves REST API for Spark ML lib. I use algorithms in ml and mllib. I am using maven for building. Minimalist in the sense, spark assembly jar contains may dependencies(fat jar). I want to reduce it as much as possible, while still be able to use the spark mllib's. – Santhosh Tpixler Jun 27 '18 at 13:08
-
Ok, let me put this in other terms. Do you have standalone spark running at least ? – eliasah Jun 27 '18 at 13:10
-
It would be great to run it without hadoop. – Santhosh Tpixler Jun 27 '18 at 13:12
-
Currently I am running spark-assembly prebuilt jar in stand alone mode. – Santhosh Tpixler Jun 27 '18 at 13:14
-
If you don't have spark you can't use spark ml/mllib. If you don't want to use spark, I suggest to read the following https://stackoverflow.com/questions/40533582/how-to-serve-a-spark-mllib-model/40536323#40536323 – eliasah Jun 27 '18 at 13:21
-
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/173887/discussion-between-santhosh-tpixler-and-eliasah). – Santhosh Tpixler Jun 27 '18 at 13:22
1 Answers
0
Depending on how an application gonna be used you can mark dependencies as provided, that will reduce the size of your jar, so deployments will be faster.
Also, you check if maven assembly also included Scala stdlib in jar (sbt assembly
by default includes Scala stdlib)

addmeaning
- 1,358
- 1
- 13
- 36
-
I cannot find the dependencies explicitly for the spark mllib. The scala stdlib is also included in the target jar. – Santhosh Tpixler Jun 27 '18 at 13:19