I'm new to hadoop and trying to install it on my local machine. I see that there are many ways in installing hadoop like install vmware Horton works and install hadoop on top of that or install Oracle virtual box , Cloudera and then Hadoop . My question is that is it mandatory to install a virtual box for running Hadoop? To put it the other way , does hadoop run only on Linux OS like Ubuntu, Redhat etc or can I install hadoop directly on Windows(without ant virtual box)?
2 Answers
Virtualbox
or vmware
is not mandatory to install/configure hadoop. Generally people use Virtualbox to create multiple virtual machines and setup hadoop cluster for experiment purpose.
Hadoop run on OS other than Redhat/ubuntu e.g. Mac OS, Windows.

- 2,229
- 2
- 14
- 14
-
Thanks for the reply. I'm confused with another question. Say, if I installed Hadoop single node cluster(my local windows machine) without installing any vmware or virtualbox. As hadoop stores data using HDFS architecture, does my OS is customized on installing hadoop on top of windows? I read that the block size in hadoop is 64MB vs 4KB i traditional Windows OS. If I run hadoop on my local machine how does the concept of parallel processing work which is the key concept/solution to Big Data. – Sharath Jun 20 '15 at 21:54
-
Hadoop processes/daemons are JVMs (i.e. java processes) , which are nothing but an application running on top of Operating System. Installing hadoop will not change your OS. In HDFS we can set block size suitable for an application (default is 64M). In Hadoop basic unit of data processing is HDFS block (not OS blocks). – Shubhangi Jun 25 '15 at 19:00
-
Key concept of hadoop- Parallel processing can be realized specially in multinode cluster -where data is stored across multiple machines/nodes and processed @ time. Hence parallel processing – Shubhangi Jun 25 '15 at 19:02
Hadoop runs on Unix and on Windows. Linux is the only supported production platform, but other flavors of Unix (including Mac OS X) can be used to run Hadoop for development. Windows is only supported as a development platform, and additionally requires Cygwin to run.
If you have Linux OS, you can directly install Hadoop and start working. If you have Windows OS, and do not know Linux(but eventually you have to learn), then you use virtual box or vmware, to have a virtual linux machine running on your windows machine.
Using Horton works or Cloudera distribution, is your choice, and comes on top of Hadoop.
Also, if you have a cluster setup, at some remote machine, then you just need putty, to try out all the hadoop features :) from shell.

- 6,948
- 6
- 18
- 30
-
Thanks for the reply. I'm confused with another question. Say, if I installed Hadoop single node cluster(my local windows machine) without installing any vmware or virtualbox. As hadoop stores data using HDFS architecture, does my OS is customized on installing hadoop on top of windows? I read that the block size in hadoop is 64MB vs 4KB in traditional Windows OS. If I run hadoop on my local machine how does the concept of parallel processing work which is the key concept/solution to Big Data – Sharath Jun 21 '15 at 04:10
-
Well the 64MB is for the way hadoop is processing. The OS have their own block sizes as you mentioned - 4k. Even though eventually the data is stored as per OS design, the Hadoop will take care of separating into chunks of 64 and then retrieving based on its own algorithm. This [link1](http://stackoverflow.com/questions/19473772/data-block-size-in-hdfs-why-64mb) will help you more – Ramzy Jun 21 '15 at 04:30