i have a two data sets names dataset1
and dataset2
and dataset1
is like
empid empame
101 john
102 kevin
and dataset2
is like
empid empmarks empaddress
101 75 LA
102 69 NY
The dataset2
will be very huge and i need to process some operations on these two datasets and need to get results from above two datasets
.
As of my knowledge, now i have two options to process these datasets:
1.Store dataset1(which is lesser in size) as hive lookup table and have to process them through Spark
2.By using Spark Broadcast Variables we can process these dataset
.
Anyone please suggest me which one is the better option.