0

I need to add the elements of columns t1, t2, t3, t4, t5 and create a new column with the result called "totaltime" in PySpark. The dataframe is of the following format:

 +--------+--------+------+------+------+------+
 |    Ser |    t1  |  t2  |  t3  |  t4  |  t5  |
 +--------+--------+------+------+------+------+
 |07142017|      84|   187|   214|   119|     7|
 |20170714|      84|   187|   209|   115|     8|
 |20170715|      83|   188|   208|   119|     6|
 |20170716|      84|   188|   206|   106|     5|
 |20170714|      86|   188|   209|   119|     4|
 +--------+--------+------+------+------+------+

I wrote the following code:

sum1 = df1.select("t1","t2","t3","t4","t5").sum()
df1 = df1.withColumn("totaltime",sum1)

I get the following error:

AttributeError: 'DataFrame' object has no attribute 'sum'

How do I do this in PySpark?

Sonia S
  • 15
  • 5

1 Answers1

0

Try this out

 df1 = df1.withColumn('totaltime', sum(df1[col] for col in ["t1","t2","t3","t4","t5"]))
raul
  • 631
  • 2
  • 10
  • 23