I have a dataframe df
that looks something like this.
+----+
| Id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+----+
What I want to do is use this Id
column in a SQL query to pull information from another table. This table I'm querying is massive so querying the entire thing and then doing a join isn't performant.
I tried something like this.
spark.sql(f"SELECT * FROM mytable WHERE Id IN {df.Id}")
but that doesn't seem to work. Is there another way to do what I'm attempting here?