col1 | col2 |
---|---|
[1,2,3,4] | [0,1,0,3] |
[5,6,7,8] | [0,3,4,8] |
desired result:
col1 | col2 |
---|---|
[6,8,10,12] | [0,4,4,11] |
In snowflake's snowpark this is relatively straight forward using array_construct. Apache Spark has a similar array function but there is a major difference.
In snowpark, I can do array_construct(count('*'), sum(col('x')), sum(col('y'), count(col('y')))
but apache spark seems to count array()
as an aggregation and complains that I can't have an aggregation inside of an aggregation.
pyspark.sql.utils.AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;
I'm trying to write a piece of code that can handle both snowpark and apache spark but this array_construct
vs array
is proving trickier than anticipated. Next up is to explore doing a groupby & collect_list but wondering how others have solved this?