MapReduce mapping

Question

I am learning the MapReduce framework and have the following questions about the same:

The MapReduce paradigm essentially has map() and reduce() (and a few others as well). Can all the programming logic be effectively represented as either a map() or a reduce()?

For e.g., suppose I want to do an in-order traversal of a tree. Can this task be effectively divided only into a map() and reduce() task? If yes, how? If no, then how do I leverage the MapReduce framework for this task?

// Iterative solution
public void inOrderIter(TreeNode root) {

    if(root == null)
        return;

    Stack<TreeNode> s = new Stack<TreeNode>();
    TreeNode currentNode=root;

    while(!s.empty() || currentNode!=null){

        if(currentNode!=null)
        {
            s.push(currentNode);
            currentNode=currentNode.left;
        }
        else
        {
            TreeNode n=s.pop();
            System.out.printf("%d ",n.data);
            currentNode=n.right;
        }
    }
}

Can we have only a map() without a corresponding reduce() and vice versa?
As per this and this, the reduce() function generates the final output - is it necessary that it should generate only a single value?
How do you decide if a certain task should be a part of map() or reduce()?
Any general pointers about how to map-ify and reduc-ify a given task?

I tried explaining your queries. Kindly check. – Gyanendra Dwivedi Mar 15 '18 at 17:45 — Gyanendra Dwivedi, Mar 15 '18 at 17:45
@gyan, thank you. It answers all my questions! :) – J. Doe Mar 15 '18 at 17:50 — J. Doe, Mar 15 '18 at 17:50

Gyanendra Dwivedi · Accepted Answer · 2018-03-15T17:52:56.587

To answer your queries:

The MapReduce paradigm essentially has map() and reduce() (and a few others as well). Can all the programming logic be effectively represented as either a map() or a reduce()?

MapReduce is a design pattern and hence suitable only for those problem cases, which fits in BigData context. Although you may be able to solve a problem via an algorithm which involves a series of map-reduce, but it may not be the most efficient code from execution parameters (resource and time required). At the same time, a traditional algorithm may not work at all (merely because you have too large data); while mapreduce may help.

Can we have only a map() without a corresponding reduce() and vice versa?

In Java API, you may have mapreduce with no reduce phase, but not vice-versa. Although, you have option for default IdentityMapper to use.

As per this and this, the reduce() function generates the final output - is it necessary that it should generate only a single value?

No, you can write as many value out of a mapper/reducer through context.write() method, as long as you honor the output types as per API.

How do you decide if a certain task should be a part of map() or reduce()?

The majority of the problem solved in map reduce belongs to aggregation, joining two data-set, and some kind of funneling down data to deduce a result. If you understand the concept and processing steps in mapreduce, you should be able to decide what to write in map() and/or reduce().

Any general pointers about how to map-ify and reduc-ify a given task?

Again it depends on what you want to achieve. In general, map() is about reading data-set, filter them (if there may be unwanted records, or part of records) and decide on what all data needs to be clubbed together against a single key. Reducer is about processing the collection of data against a key (written by mapper).

MapReduce mapping

1 Answers1