1

Using Hadoop MapReduce

I have a list as input:

  1. A
  2. B
  3. C

And I want to get the Cartesian product of the list with itself:

  • A => A,f(A,A)
  • A => B,f(A,B)
  • A => C,f(A,C)
  • B => A,f(B,A)
  • B => B,f(B,B)
  • B => C,f(B,C)
  • C => A,f(C,A)
  • C => B,f(C,B)
  • C => C,f(C,C)

f() is a function that gives a value for a pair of keys.

How do I do that a in a simple manner using Hadoop MapReduce in Java?

Of course I can't hold the entire input list in memory.

Thanks!!

Romi Kuntsman
  • 442
  • 5
  • 13
  • http://stackoverflow.com/questions/1719594/iterative-cartesian-product-in-java – goat Jun 29 '13 at 23:40
  • Hi Chris, It is indeed trivial to do a Cartesian multiplication in Java when you can iterate both arrays in the same code. However, I need a solution for Hadoop, where data is streamed and partitioned. Thanks! – Romi Kuntsman Jun 30 '13 at 05:26
  • Can you use Pig Latin? As well as I know, Pig uses a tricky way to do the Cartesian product (called cross join in Pig). It's a complex manner so that I do not suggest you to implement it by yourself. – zsxwing Jun 30 '13 at 10:16
  • I need to do it directly without using Pig. Why would implementing a join of a list on itself should be highly complicated? – Romi Kuntsman Jun 30 '13 at 12:16

1 Answers1

1

You can implement it in Java map reduce. Let us assume, you want to do cross product between two files A and B with splits 3 and 4 respectively. Then you have to write custom input format that splits up the two datasets and then ensured there was a SPLIT for each subset of data.

So your splits would look like:

 A1 X B1
 A1 X B2
 A1 X B3
 A1 X B4
 A2 X B1
 A2 X B2
 A2 X B3
 A2 X B4
 A3 X B1
 A3 X B2
 A3 X B3
 A3 X B4

Use link https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch5/CartesianProduct.java for your reference.

Ashish
  • 5,723
  • 2
  • 24
  • 25
  • Can you please explain *how* to implement that custom dataset to split up the two datasets correctly using Java+MapReduce? – Romi Kuntsman Jul 01 '13 at 13:56