df_Description3 = df_Description1.fillna(value="0",subset=["DES","INV","MKT","SHO"])
lst_Cols= ["DES","INV","MKT","SHO"]
def Merge(c1,c2,c3,c4):
if "DES"!="0":
return c1
elif "INV"!='0':
return c2
elif "MKT"!="0":
return c3
elif "SHO"!="0":
return c4
return c1,c2,c3,c4
myudf = F.udf(Merge,StringType())
df_Description3 = df_Description3.withColumn("Descriptions",myudf("DES","INV","MKT","SHO")).show()
Asked
Active
Viewed 88 times
0

Equinox
- 6,483
- 3
- 23
- 32
-
what's the intent of the function `merge()`? can you share your target dataframe? – samkart Jul 21 '22 at 06:55
-
share ur df_Description1 and target dataframe would be better. – Linus Jul 21 '22 at 07:05
-
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jul 21 '22 at 09:25
1 Answers
0
Using UDFs is not recommended as it can impact performance (see this). So here is a solution using spark functions. Note that concat_ws() will handle nulls during the merging of columns so you don't have to do the extra step of filling with 0
and then removing it. At the end if all 4 columns are null then you can drop the row by checking if it is empty.
from pyspark.sql import functions as F
df_Description3 = df_Description3.withColumn("Descriptions", F.concat_ws(" ", "DES", "INV", "MKT", "SHO"))
df_Description3 = df_Description3.filter(F.col("Descriptions") != "")
df_Description3.show()

viggnah
- 1,709
- 1
- 3
- 12