0

I have two Spark dataframes (I'm using python), say A and B. A contains a column with a string (say "Name"), whereas B contains a column with a list of strings (say "NamesList"). What I would like to do is merge A and B based on whether A.Name is contained in B.NamesList.

So to give you an example, A could be

+---+------+
| Id|  Name|
+---+------+
|  1|George|
|  2| Sarah|
+---+------+

B could be

+---+--------------------+
|Id2|           NamesList|
+---+--------------------+
|  6| [Bob, Alice, Sarah]|
|  7|[Thomas, Bob, Alice]|
+---+--------------------+

And I would like the result to be

+---+---+-----+-------------------+
| Id|Id2| Name|          NamesList|
+---+---+-----+-------------------+
|  2|  6|Sarah|[Bob, Alice, Sarah]|
+---+---+-----+-------------------+

Any ideas how to do this in an efficient way?

zero323
  • 322,348
  • 103
  • 959
  • 935
bettaberg
  • 63
  • 1
  • 5

0 Answers0