1

I have a list of integers and a sqlcontext dataframe with the number of rows equal to the length of the list. I want to add the list as a column to this dataframe maintaining the order. I feel like this should be really simple but I can't find an elegant solution.

Timothy Elbert
  • 137
  • 1
  • 3
  • 13
  • Possible duplicate of [PySpark: Add a column to DataFrame when column is a list](https://stackoverflow.com/questions/36132899/pyspark-add-a-column-to-dataframe-when-column-is-a-list) – Rudr Jan 04 '19 at 20:19

1 Answers1

1

You cannot simply add a list as a dataframe column since list is local object and dataframe is distirbuted. You can try one of thw followin approaches:

  • convert dataframe to local by collect() or toLocalIterator() and for each row add corresponding value from the list OR
  • convert list to dataframe adding an extra column (with keys from dataframe) and then join them both
Mariusz
  • 13,481
  • 3
  • 60
  • 64
  • 1
    I ended up doing the second because collect or toLocalIterator would have overwhelmed the memory. The trouble was that it took me a while to figure out how to do the second point, which is partly why I asked the question. I didn't ask this explicitly because I was hoping there was a more elegant way. – Timothy Elbert Nov 07 '16 at 14:17