1

I don't have much experience in PySpark

I need to check in Spark Dataframe how many values are greater then certain threshold(absolute) in a row. I tried this and it doesn't work

n = lit(len(df.columns))
rank= (reduce(add, (1 for x in df.columns[1:] if abs(col(x)) > threshold))).alias(rank)
YAKOVM
  • 9,805
  • 31
  • 116
  • 217

1 Answers1

1

You cannot use Python conditionals here. Instead use when / otherwise (Spark Equivalent of IF Then ELSE).

from pyspark.sql.functions import abs, col, when

reduce(
    add, 
    [when(abs(col(x)) > threshold, 1).otherwise(0) for x in df.columns[1:]])