I have one Dataframe ( or I could make it two datafarmes if necessary)
+---+-----------------+--------------------+
| id| director_name| movie_title|
+---+-----------------+--------------------+
| 01| james cameron| avatar|
| 02| gore verbinski|pirates caribbean...|
| 03| sam mendes| spectre|
| 04|christopher nolan| dark knight rises|
| 05| doug walker|star wars episode...|
| 06| andrew stanton| john carter|
| 07| sam raimi| spider man 3|
| 08| nathan greno| tangled|
| 09| joss whedon| avengers age ultron|
| 10| david yates|harry potter half...|
+---+-----------------+--------------------+
I want it to look like this:
+---+--------------------+
| id| key|
+---+--------------------+
| 01| james cameron|
| 02| gore verbinski|
| 03| sam mendes|
| 04| christopher nolan|
| 05| doug walker|
| 06| andrew stanton|
| 07| sam raimi|
| 08| nathan greno|
| 09| joss whedon|
| 10| david yates|
| 01| avatar|
| 02|pirates caribbean...|
| 03| spectre|
| 04| dark knight rises|
| 05|star wars episode...|
| 06| john carter|
| 07| spider man 3|
| 08| tangled|
| 09| avengers age ultron|
| 10|harry potter half...|
+---+--------------------+
I surmise the Pandas method append() does this very same thing, but I could not find a solution for pySpark. I apologize if I have overlooked something!
I would like to avoid converting to pandas, as this df might get pretty big...