I have seen multiple posts but the aggregation is done on multiple columns , but I want the aggregation based on col OPTION_CD, based on the following condition: If have conditions attached to the dataframe query, which is giving me the error 'DataFrame' object has no attribute '_get_object_id'
IF NULL(STRING AGG(OPTION_CD,'' order by OPTION_CD),''). What I can understand is that if OPTION_CD col is null then place a blank else append the OPTION_CD in one row separated by a blank.Following is the sample table :
first there is filteration to get only 1 and 2 from COl 1, then the result should be like this :
Following is the query that I am writing on my dataframe
df_result = df.filter((df.COL1 == 1)|(df.COL1 == 2)).select(df.COL1,df.COL2,(when(df.OPTION_CD == "NULL", " ").otherwise(df.groupBy(df.OPTION_CD))).agg(
collect_list(df.OPTION_CD)))
But not getting the desired results. Can anyone help in this? I am using pyspark.