I am trying to set up a fully-distributed Hadoop/MapReduce instance where each node will be running a series of C++ Hadoop Streaming task on some input. However I don't want to move all the input tasks onto HDFS - instead I want to see if there is a way to read input data from the local folders of each node.
Is there anyway to do this?
EDIT: An example of a hadoop command I would like to run is something similar to:
hadoop jar $HADOOP_STREAM/hadoop-streaming-0.20.203.0.jar \
-mapper map_example \
-input file:///data/ \
-output /output/ \
-reducer reducer_example \
-file map_example \
-file reducer_example
In this case, the data stored in each of my nodes is in the /data/ directory, and I want the output to go to the /output/ directory of each individual node. The map_example and reducer_example files are locally available in all nodes.
How would I be able to implement a Hadoop command which if it is run on the master node, then all the slave nodes will essentially run the same task on an x number of nodes, resulting in a local output file in each node (based on the local input files)?
Thanks