When I am trying to concate 3 ArrayType columns of a Spark DataFrame, I am getting erroneous outputs in some rows.
Since,some of the DataFrame have no values, so when they are combined - the output comes as [walmart, []] (for e.g.). I don't want the output to show those empty values. For e.g Dataframe has column name as concat_values and values are:-
[walmart, supercenter, walmart supercenter, [walmartsupercenter]]
[walmart, []]
[mobil, []]
[[]]
[dollar general]
[marriott vacations, vacations worldwide, marriott vacations worldwide]
The output should be
[walmart, supercenter, walmart supercenter, [walmartsupercenter]]
[walmart]
[mobil]
[]
[dollar general]
[marriott vacations, vacations worldwide, marriott vacations worldwide]
The UDF that I have implemented in the code is:-
from pyspark.sql.functions import col, udf
from pyspark.sql.types import ArrayType, StringType
from pyspark.sql import functions as F
concat_string_arrays = F.udf(lambda w,x,y,z : w+x+y+z,ArrayType(StringType()))
Please help me with this. Thanks