updating pyspark dataframe column based on nested dictionary

Asked Oct 23 '18 at 21:24

Active Nov 13 '18 at 18:14

Viewed 1,315 times

I have a spark dataframe of ~70 mil rows with 3 columns ['id','date','val'] and a nested dictionary in the form of

dict = {
    'A' = {
        '2018-09-31' = val1,
        '2018-10-01' = val2
    }
}

A is the ID column and there is another column for the dates. I am trying to update val which is in another column based on this nested dictionary accessible by dict['A']['2018-09-31'] for example. Also, the update will only be if A is contained in a list, indexList.

I have looked at and tried the methods from below:

Something like below wouldn't work

update_func = (F.when(F.col('id').isin(indexList), mydict[F.col('id')][F.col('date')]).otherwise(F.col('val')))
df = df.withColumn('new_val, update_func)

The error message I get is unhashable type: 'Column'

update: I avoided the problem by creating a new string key column combining the two columns used as keys

edited Nov 13 '18 at 18:14

asked Oct 23 '18 at 21:24

tkim

1

Did you find the solution for this? @tkim – Saeed Esmaili Apr 06 '20 at 09:40

updating pyspark dataframe column based on nested dictionary

0 Answers0