I am wondering if anyone is aware of any discussion of Joins vs Lookups in Spark? I have seen this page : Lookup in spark dataframes where everyone basically says that joins are far superior to lookups and I was unsuccessful in my google-fu attempt to find anything backing that up or even discussing the two topics.
Asked
Active
Viewed 1,190 times
2 Answers
3
Such thing as lookup in Spark DataFrame simply doesn't exist, therefore it is inferior to any other solution and join (hash or broadcast) or using local data structures is the only option.

user8833920
- 46
- 1
1
Lookups and Joins are two different concepts in relational data systems. Therefore, it doesn't really make sense in a general context to say that one is superior to the other because they have different functions. A lookup is simply finding data, sometimes using a key or hash value to optimize query speed. A join is using common elements in two data sets to create a new data set.
E.g. (completely hypothetical and abstract)
Lookup query 1
= 'Hello'
Join query 1 , query 2
=
'Hello world'
if query 2 equals world

Alex W
- 37,233
- 13
- 109
- 109