Does spark GROUPED_MAP udf on a data frame run parallelly?

Question

I'm trying to apply a PandasUDFType.GROUPED_MAP function which takes a data frame as input and yields a data frame as an output. When I do sdf.groupby(key).apply(pandas_udf) does it apply the function parallelly to multiple groups based on available resources or sequentially one group after the other? I haven't changed any default settings of spark. What other alternatives can I employ if I want to execute udf on groups parallelly.

score 1 · Answer 1 · answered Aug 12 '20 at 07:32

1

Yes UDFs are executed parallel but the execution is not as optimized as spark native functions.

More info here: Spark functions vs UDF performance?

answered Aug 12 '20 at 07:32

uxke

446
3
9

Does spark GROUPED_MAP udf on a data frame run parallelly?

1 Answers1