3

I am very new to Apache Spark. I am trying to create a JavaPairRdd from HashMap. I have a HashMap of type <String,<Integer,Integer>> How can I convert it into a JavaPairRdd? I have pasted my code below:

HashMap<String, HashMap<Integer,String>> canlist =
    new HashMap<String, HashMap<Integer,String>>();

for (String key : entityKey)
{
    HashMap<Integer, String> clkey = new HashMap<Integer, String>();
    int f=0;
    for (String val :mentionKey)
    {
        //do something
        simiscore = (longerLength - costs[m.length()]) / (double) longerLength;

        if (simiscore > 0.6) {
            clkey.put(v1,val);
            System.out.print(
                " The mention  " + val + " added to link entity  " + key);
            }
            f++;
            System.out.println("Scan Completed");
    }
    canlist.put(key,clkey);
    JavaPairRDD<String, HashMap<Integer, String>> rad;
    rad = context.parallelize(scala.collection.Seq(toScalaMap(canlist)));

}
public static <String,Object> Map<String,Object> toScalaMap(HashMap<String,Object> m) {
    return (Map<String,Object>) JavaConverters.mapAsScalaMapConverter(m).asScala().toMap(
            Predef.<Tuple2<String,Object>>conforms()
    );}
}
galath
  • 5,717
  • 10
  • 29
  • 41
Rockan
  • 141
  • 1
  • 2
  • 9
  • 1
    It would be useful if you provide an expected output. I see at least two possible options: `JavaPairRdd>` or `JavaPairRdd, String>` – zero323 Jul 25 '15 at 16:38
  • The expected output is of the form JavaPairRdd> – Rockan Jul 25 '15 at 16:40
  • Using [`JavaConverters`](http://stackoverflow.com/q/11903167/1560062) to convert to Scala Map, and then calling `toSeq` should work. – zero323 Jul 25 '15 at 16:54
  • Is there any way to do it purely in Java? – Rockan Jul 25 '15 at 18:18
  • As far as I know `parallelize` requires `scala.collection.Seq` as an argument. – zero323 Jul 25 '15 at 18:27
  • Tried doing it this way. Gives me an error at scala.collection which says Qualifier must be an expression. JavaPairRDD> rad; rad = context.parallelize(scala.collection.Seq(toScalaMap(canlist))); } public static Map toScalaMap(HashMap m) { return (Map) JavaConverters.mapAsScalaMapConverter(m).asScala().toMap( Predef.>conforms() ); – Rockan Jul 25 '15 at 18:37
  • It would be better if you post it as an [edit](http://stackoverflow.com/posts/31628605/edit) to the question. Blocks of code in comments are rather hard to read. – zero323 Jul 25 '15 at 18:41

3 Answers3

9

If you convert the HashMap into a List<scala.Tuple2<Integer, String>>, then you can use JavaSparkContext.parallelizePairs.

Daniel Darabos
  • 26,991
  • 10
  • 102
  • 114
  • Sorry I don't have a full example. I don't use Spark via Java. You're better off with Scala if you have that option! – Daniel Darabos Jul 25 '15 at 19:10
  • 2
    This worked. Although I had to change the HashMap to List> – Rockan Jul 27 '15 at 19:16
  • 1
    Cool! I've added a note about this in the answer. I don't know how you do the actual conversion in Java. Perhaps you could add the code to the answer — sounds like it would be useful for future readers of this page. – Daniel Darabos Jul 28 '15 at 07:22
1

Here is another way to convert java HashMap<String, HashMap<Integer,String>> to List<Tuple2<String, HashMap<Integer,String>>> and pass to parallelizePairs() method of JavaSparkContext.

import scala.Tuple2;

List<Tuple2<String, HashMap<Integer,String>>> list = new ArrayList<Tuple2<String, HashMap<Integer,String>>>();      
for(Map.Entry<String, HashMap<Integer,String>> entry : canlist.entrySet()){
    list1.add(new Tuple2<String, HashMap<Integer,String>>(entry.getKey(),entry.getValue()));
  }

JavaPairRDD<String, HashMap<Integer, String>> javaPairRdd = jsc.parallelizePairs(list);
abaghel
  • 14,783
  • 2
  • 50
  • 66
0

Code snippet of the generic method for conversion. Utilize JavaSparkContext.parallelizePairs() with the result of this method.

    //fromMapToListTuple2() generic method to convert Map<T1, T2> to List<Tuple2<T1, T2>>
    public static <T1, T2> List<Tuple2<T1, T2>> fromMapToListTuple2(Map<T1, T2> map)
    {
        //list of tuples
        List<Tuple2<T1, T2>> list = new ArrayList<Tuple2<T1, T2>>();

        //loop through all key-value pairs add them to the list
        for(T1 key : map.keySet())
        {
            //get the value
            T2 value = map.get(key);

            //Tuple2 is not like a traditional Java collection, but a single k-v pair;
            Tuple2<T1, T2> tuple2 = new Tuple2<T1, T2>(key, value);

            //populate the list with created tupple2
            list.add(tuple2);
        } // for

        return list;
    } // fromMapToListTuple2
CavaJ
  • 181
  • 1
  • 4