16

I am trying to use a "chained when" function. In other words, I'd like to get more than two outputs.

I tried using the same logic of the concatenate IF function in Excel:

  df.withColumn("device_id", when(col("device")=="desktop",1)).otherwise(when(col("device")=="mobile",2)).otherwise(null))

But that doesn't work since I can't put a tuple into the "otherwise" function.

Grr
  • 15,553
  • 7
  • 65
  • 85
Fede
  • 173
  • 1
  • 1
  • 6

1 Answers1

53

Have you tried:

from pyspark.sql import functions as F
df.withColumn('device_id', F.when(col('device')=='desktop', 1).when(col('device')=='mobile', 2).otherwise(None))

Note that when chaining when functions you do not need to wrap the successive calls in an otherwise function.

pault
  • 41,343
  • 15
  • 107
  • 149
Grr
  • 15,553
  • 7
  • 65
  • 85
  • Super useful the `when` chaining composition! Thanks! – raul ferreira Oct 18 '18 at 15:06
  • 2
    These are chained operations - if multiple when() are true, does the variable get assigned the first condition that is True or the last one? – Thomas Nov 18 '18 at 18:55
  • 2
    @Thomas If multiple consecutive when() statements are true, only the first when() evaluated to true is considered. – sondrelv Oct 16 '19 at 13:43