I am new to Big Data ecosystem. I am trying to install Apache Spark but the tutorials I found online ask me to first install a virtual machine. Cana someone please explain why I need a VM on my Windows machine?
2 Answers
You don't.
Spark uses Java. Java runs on all Operating Systems.
Tutorials might use the Hortonworks or Cloudera VMs because everything is pre-configured, but that is just an optimization that you could do on your own OS.

- 179,855
- 19
- 132
- 245
-
Hi cricket_007, I am new to spark and most of the tutorials I found online use either Hortonworks or cloudera, but now I understand the reason why. Thank you. – Arun kumar Feb 23 '19 at 20:06
There is no requirement to Apache Spark that you run it on a virtual machine. You can run it perfectly fine on your own computer locally. However normally when you are running software such as Apache Spark, you are processing huge amounts of data, and in doing so need to run large amounts of instances of the software in clusters. This type of requirement, which is an effect of having large data-sets means that it makes more sense to run them on virtual machines, which you can have several of on a single server, instead of one instance per physical server.

- 1,116
- 9
- 24
-
Hi Rietty, Thank you for explaining the reasoning behind it. I am new to Spark and I feel like I have a better understanding now. Thanks again, Arun. – Arun kumar Feb 23 '19 at 20:05
-
@Arunkumar mark an answer as expected if it answered the question (green checkmark) – Rietty Feb 23 '19 at 20:10