I have the following pyspark dataframe
identification | p1 | p2 | p3 | p4 |
---|---|---|---|---|
1 | 1 | 0 | 0 | 1 |
2 | 0 | 1 | 1 | 0 |
3 | 0 | 0 | 0 | 1 |
I want to concatenate all columns from p1 to p4 in a way to gather the values of 1 and 0 in this way
identification | p1 | p2 | p3 | p4 | joined_column |
---|---|---|---|---|---|
1 | 1 | 0 | 0 | 1 | 1001 |
2 | 0 | 1 | 1 | 0 | 0110 |
3 | 0 | 0 | 0 | 1 | 0001 |
All columns from p1 to p4 are Integer, so I tried to cast them into string before concatenate all columns ( except the 'identification' column ) in this way:
from pyspark.sql.types import StringType
from pyspark.sql import functions as F
df_concat=df.withColumn('joined_column', F.concat([F.col(c).cast(StringType()) for c in df.columns if c!='identification']))
I get the following error :
TypeError: Invalid argument, not a string or column:
Is there any solution or other way to concatenate all columns of a pyspark dataframe into one string ?