0

I have a HashMap < String,ArrayList < Integer > >. I want to serialize my HashMap object(hmap) to HDFS location and later deserialize it at Mapper and Reducers for using it.

To serialize my HashMap object on HDFS I used normal java object serialization code as follows but got error (permission denied)

try
        {
            FileOutputStream fileOut =new FileOutputStream("hashmap.ser");
            ObjectOutputStream out = new ObjectOutputStream(fileOut);
            out.writeObject(hm);
            out.close();

        }
        catch(Exception e)
        {
            e.printStackTrace();
        }

I got the following exception

java.io.FileNotFoundException: hashmap.ser (Permission denied)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
    at KMerIndex.createIndex(KMerIndex.java:121)
    at MyDriverClass.formRefIndex(MyDriverClass.java:717)
    at MyDriverClass.main(MyDriverClass.java:768)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Can someone please suggest or share the sample code of how to serialize object in hadoop on hdfs ?

Community
  • 1
  • 1
Sumit
  • 27
  • 8

1 Answers1

1

Please try using SerializationUtils from Apache Commons Lang.

Below are methods

static Object   clone(Serializable object)  //Deep clone an Object using serialization.
static Object   deserialize(byte[] objectData) //Deserializes a single Object from an array of bytes.
static Object   deserialize(InputStream inputStream)  //Deserializes an Object from the specified stream.
static byte[]   serialize(Serializable obj) //Serializes an Object to a byte array for storage/serialization.
static void serialize(Serializable obj, OutputStream outputStream) //Serializes an Object to the specified stream.

While storing in to HDFS you can store byte[] which was returned from serialize. While getting the Object you can type cast to corresponding object for ex: File object and can get it back.

In my case, I was storing one hashmap in Hbase column, I retrieved it back, in my mapper method as Hashmap as it is.. and was successful in that.

Surely, you can also do that in the same way...

Another thing is You can also Use Apache Commons IO refer this (org.apache.commons.io.FileUtils); but later you need to copy this file to HDFS. since you wanted HDFS as datastore.

FileUtils.writeByteArrayToFile(new File("pathname"), myByteArray);

Note : Both jars apache commons io and apache commons lang are always available in hadoop cluster.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Thanks for reply. Can you please tell me how do I write byte[] returned by SerializationUtils.serialize(obj) method on HDFS and here my serializable object will hashmap object, right ? – Sumit May 31 '16 at 14:12
  • Yes you are right. you can serialize hashmap to byte array. if you deserialize you can take the same hashmap back. I have done that infact, only diff.. is your data store is hdfs by data store is Hbase column. thats all – Ram Ghadiyaram May 31 '16 at 14:17
  • yes hashmap implements serializable. if you see the hierarchy. you can get it. see this https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html – Ram Ghadiyaram May 31 '16 at 14:19
  • Thanks a lot for ur answer sir . – Sumit May 31 '16 at 14:26