I am working with flume to ingest a ton of data into hdfs (about petabytes of data). I would like to know how is flume making use of its distributed architecture? I have over 200 servers and I have installed flume in one of them from where I would get the data from (aka data source) and the sink is the hdfs. (hadoop is running over serengeti in these servers). I am not sure whether flume distributes itself over the cluster or I have installed it incorrectly. I followed apache's user guide for flume installation and this post of SO.
How to install and configure apache flume?
http://flume.apache.org/FlumeUserGuide.html#setup
I am a newbie to flume and trying to understand more about it..Any help would be greatly appreciated. Thanks!!