I have a RDD
JavaPairRDD<String,Customer> RDD1
Which has one record
cust_id, first_name, lastname
1 "rahul" "koshaley"
and JavaPairRDD<String,Customer> RDD2
again which has one record
cust_id , first_name , last_name
1 "rahul" ""
when I do union JavaPairRDD<String,Customer> unionRDD = RDD1.union(RDD2);
The union operation gives me 2 records
1) 1 , "rahul" , "koshaley"
2) 1 , "rahul" , ""
Now when I do distinct on unionRDD ie
JavaPairRDD<String,Customer> distinct = unionRDD.distinct();
will the resulting RDD distinct give me output as
1 , "rahul" , "koshaley" or
1 , "rahul" , ""
I want the output RDD to contain the record which has all the values
ie 1 , "rahul" , "koshaley"
!
EDIT -> THIS QUESTION IS NOT DUPLICATE as I NEED TO KNOW WHICH ONE OF THE DUPLICATE RECORDS WILL SPARK PICK AFTER DISTINCT OPERATION.