I have a pyspark data frame that has 7 columns, I have to add a new column named "sum" and calculate a number of columns that have data (Not null) in the sum column.Example a data frame in which yellow highlighted part is required answer
Asked
Active
Viewed 45 times
1 Answers
1
This sum can be calculated like this:
df = spark.createDataFrame([
(1, "a", "xxx", None, "abc", "xyz","fgh"),
(2, "b", None, 3, "abc", "xyz","fgh"),
(3, "c", "a23", None, None, "xyz","fgh")
], ("ID","flag", "col1", "col2", "col3", "col4", "col5"))
from pyspark.sql import functions as F
from pyspark.sql.types import IntegerType
df2 = df.withColumn("sum",sum([(~F.isnull(df[col])).cast(IntegerType()) for col in df.columns]))
df2.show()
+---+----+----+----+----+----+----+---+
| ID|flag|col1|col2|col3|col4|col5|sum|
+---+----+----+----+----+----+----+---+
| 1| a| xxx|null| abc| xyz| fgh| 6|
| 2| b|null| 3| abc| xyz| fgh| 6|
| 3| c| a23|null|null| xyz| fgh| 5|
+---+----+----+----+----+----+----+---+
Hope this helps!

michalrudko
- 1,432
- 2
- 16
- 30
-
Thank for, It's giving me error "Column is not iterable". – Jan 19 '20 at 15:05
-
actually, in my case it's... `F.sum(...)` that gives me such error, so I'd say please check your imports and this should work – michalrudko Jan 19 '20 at 16:09
-
1I have imported like this from pyspark.sql import functions as F from pyspark.sql.types import IntegerType still, it is not working. – Jan 19 '20 at 17:10
-
Again, this must have something to do with the imports - maybe you have imported something above which overrides some functions? Please remove all the preceding imports and try again. Another explanation to this issue you may find here: https://stackoverflow.com/a/53868119/4113409 . There is nothing else I could do here... – michalrudko Jan 19 '20 at 17:51
-
you're aggregating the columns in a row, so make sure you're using the Python's sum function, not the PySpark one (from sql.functions) "You can delete the reference of the pyspark function with `del sum`." – michalrudko Jan 19 '20 at 17:54
-
2Thank you very much, After deleting sum (del sum) it worked. – Jan 19 '20 at 18:51
-
Glad to hear that! Please just kindly mark this as a correct answer if you're happy with the result. Thanks! – michalrudko Jan 19 '20 at 19:41
-
I already did it but it is showing me "Votes cast by those with less than 15 reputations are recorded but do not change the publicly displayed post score. – Jan 20 '20 at 05:10
-
ah, ok - thanks then :), I guess that this will be visible as soon you get 15 reputation, good luck! – michalrudko Jan 20 '20 at 11:20