0
@udf(returnType=MapType(StringType(), FloatType()))
def postprocess(data):
    ret = dict()
    ....
    # insert key and values to dictionary from data
    ...

    return ret

ret = postprocess(col('data'))
print(ret) # Column<'postprocess(data)'>

I would like to create multiple columns from dictionary column.

If ret has {"key1": 0.1, "key2": 0.3}, the result should be

| key1 | key2 |

| 0.1 | 0.3 |

How can I create it?

alryosha
  • 641
  • 1
  • 8
  • 15

1 Answers1

1

To achieve your goal, you can use .explode() to create multiple columns from a dictionary column. Details: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html

However, in the performance perspective, not sure how complicated your UDF is, I think you should use the spark sql function to create the columns instead of using the Python UDF function if it's possible. You can check this post: https://stackoverflow.com/a/38297050/10445333

Jonathan Lam
  • 1,761
  • 2
  • 8
  • 17