I am doing two jobs of Word count example in the same cluster (I run hadoop 2.65 locally with my a multi-cluster) where my code run the two jobs one after the other. Where both of the jobs share the same mapper, reducer and etc. but each one of them has a different Partitioner.
Why there is a different allocation of the reduce task on the nodes for the second job? I am identifying the reduce task node by the node's IP (Java getting my IP address). I know that the keys would go to a different reduce task but I want that their destination would stay unchanged.
For example, I have five different keys and four reduce task. The allocation for Job 1 is:
- partition_1 ->NODE_1
- partition_2 ->NODE_1
- partition_3 ->NODE_2
- partition_4 ->NODE_3
The allocation for Job 2 is:
- partition_1 ->NODE_2
- partition_2 ->NODE_3
- partition_3 ->NODE_1
- partition_4 ->NODE_3