0

So, I want to create a new column in my dataframe, whose rows depend upon values from two columns, and also involves a condition.

I tried this, but it doesn't work.

some_value = ...
df = df.withColumn("new_col", col("col1") if col("col2") == some_value else None)

What is the correct way of doing this?

MetallicPriest
  • 29,191
  • 52
  • 200
  • 356
  • 2
    you can take a look at `when` and `otherwise` , if that doesnt help , please create a testable example – anky Apr 29 '20 at 17:22
  • 1
    or expr is a possibility. – thebluephantom Apr 29 '20 at 17:23
  • 2
    Does this answer your question? [PySpark: Create New Column And Fill In Based on Conditions of Two Other Columns](https://stackoverflow.com/questions/51565395/pyspark-create-new-column-and-fill-in-based-on-conditions-of-two-other-columns) What happen to google?! – mazaneicha Apr 29 '20 at 17:31

3 Answers3

0

A trivial example using with expr, can also use when:

val df3 = df2.withColumn("new_col", expr("case when c1 = 1 and c2 = 101 then c1 + c2 else 999 end"))
thebluephantom
  • 16,458
  • 8
  • 40
  • 83
0

This worked for me,

 new_col_expr = when(col("col2").eqNullSafe(some_value), col("col1")).otherwise(None)
 df = df.withColumn("new_col", new_col_expr)
MetallicPriest
  • 29,191
  • 52
  • 200
  • 356
0

You can use when function.

newval='10002'
df2 = df.withColumn("new_col", when((col("col1") == lit(newval)), "col1").otherwise(None))
df2.show()
Piyush Patel
  • 1,646
  • 1
  • 14
  • 26