[TL;DR,]
You can do this:
from functools import reduce
from operator import add
from pyspark.sql.functions import col
df.na.fill(0).withColumn("result" ,reduce(add, [col(x) for x in df.columns]))
Explanation:
The df.na.fill(0)
portion is to handle nulls in your data. If you don't have any nulls, you can skip that and do this instead:
df.withColumn("result" ,reduce(add, [col(x) for x in df.columns]))
If you have static list of columns, you can do this:
df.withColumn("result", col("col1") + col("col2") + col("col3"))
But if you don't want to type the whole columns list, you need to generate the phrase col("col1") + col("col2") + col("col3")
iteratively. For this, you can use the reduce
method with add
function to get this:
reduce(add, [col(x) for x in df.columns])
The columns are added two at a time, so you would get col(col("col1") + col("col2")) + col("col3")
instead of col("col1") + col("col2") + col("col3")
. But the effect would be same.
The col(x)
ensures that you are getting col(col("col1") + col("col2")) + col("col3")
instead of a simple string concat (which generates (col1col2col3
).