0

I would like to have an arrayList that holds reference to object inside the reduce function.

@Override
public void reduce( final Text pKey,
                    final Iterable<BSONWritable> pValues,
                    final Context pContext )
        throws IOException, InterruptedException{
    final ArrayList<BSONWritable> bsonObjects = new ArrayList<BSONWritable>();

    for ( final BSONWritable value : pValues ){
        bsonObjects.add(value);
        //do some calculations.
    }
   for ( final BSONWritable value : bsonObjects ){
       //do something else.
   }
   }

The problem is that the bsonObjects.size() returns the correct number of elements but all the elements of the list are equal to the last inserted element. e.g. if the

{id:1}

{id:2}

{id:3}

elements are to be inserted the bsonObjects will hold 3 items but all of them will be {id:3}. Is there a problem with this approach? any idea why this happens? I have tried to change the List to a Map but then only one element was added to the map. Also I have tried to change the declaration of the bsonObject to global but the same behavior happes.

maxsap
  • 2,971
  • 9
  • 44
  • 70

1 Answers1

2

This is documented behavior. The reason is that the pValues Iterator re-uses the BSONWritable instance and when it's value changes in the loop all references in bsonObjects ArrayList are updated as well. You're storing a reference when you call add() on bsonObjects. This approach allows Hadoop to save memory.

You should instantiate a new BSONWritable variable in that first loop that equals the variable value (deep copy). Then add the new variable into bsonObjects.

Try this:

for ( final BSONWritable value : pValues ){
    BSONWritable v = value; 
    bsonObjects.add(v);
    //do some calculations.
}
for ( final BSONWritable value : bsonObjects ){
   //do something else.
}

Then you will be able to iterate through bsonObjects in the second loop and retrieve each distinct value.

However, you should also be careful -- if you make a deep copy all the values for the key in this reducer will need to fit in memory.

Girish Rao
  • 2,609
  • 1
  • 20
  • 24
  • 1
    you're missing a new invocation in your first loop, this will still exhibit the same behaviour - you need to actually make a copy of the `value` before adding that deep copy to the `bsonObjects` list – Chris White Jun 12 '12 at 23:29
  • Right thanks. In the past for my purposes I've run toString() on the Object, thereby saving a String object instead of the reference. To make a deep copy, well, I'll just reference an earlier SO question: http://stackoverflow.com/questions/64036/how-do-you-make-a-deep-copy-of-an-object-in-java – Girish Rao Jun 12 '12 at 23:47
  • Just use `ReflectionUtils.copy(conf, src, dest);` – Chris White Jun 13 '12 at 00:15