1

So I've seen this solution:

ValueError: Cannot convert column into bool

which has the solution I think. But I'm trying to make it work with my dataframe and can't figure out how to implement it.

My original code:

if df2['DayOfWeek']>=6 : 
   df2['WeekendOrHol'] = 1

this gives me the error:

Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

So based on the above link I tried:

from pyspark.sql.functions import when
when((df2['DayOfWeek']>=6),df2['WeekendOrHol'] = 1)   
when(df2['DayOfWeek']>=6,df2['WeekendOrHol'] = 1)

but this is incorrect as it gives me an error too.

pault
  • 41,343
  • 15
  • 107
  • 149
Reddspark
  • 6,934
  • 9
  • 47
  • 64
  • you can do `df2.withColumn("WeekendOrHol", (df2["DayOfWeek"]>=6).cast("int"))`. This is essentially the same question as [this post](https://stackoverflow.com/questions/55437751/pyspark-i-want-to-manually-map-the-values-of-one-of-the-columns-in-my-dataframe/55438115#55438115) – pault Apr 03 '19 at 13:25
  • Thanks - I think this is the best answer. I'll see if I can get this marked as a duplicate. – Reddspark Apr 04 '19 at 01:00
  • Can't mark it as a duplicate since the other question doesn't have an upvoted or accepted answer yet. – pault Apr 04 '19 at 14:46

2 Answers2

4

To update a column based on a condition you need to use when like this:

from pyspark.sql import functions as F

# update `WeekendOrHol` column, when `DayOfWeek` >= 6, 
# then set `WeekendOrHol` to 1 otherwise, set the value of `WeekendOrHol` to what it is now - or you could do something else. 
# If no otherwise is provided then the column values will be set to None
df2 = df2.withColumn('WeekendOrHol', 
                     F.when(
                        F.col('DayOfWeek') >= 6, F.lit(1)
                     ).otherwise(F.col('WeekendOrHol')
                   )

Hope this helps, good luck!

mkaran
  • 2,528
  • 20
  • 23
  • 2
    Thanks - I had to add an extra bracket at the end, and ensure the new field already existed in the dataframe first (using `df2=df2.withColumn('WeekendOrHol', F.lit(0))`). However I think pault's answer above is the most elegant so will go with that. – Reddspark Apr 04 '19 at 01:01
1

Best answer as provided by pault:

df2=df2.withColumn("WeekendOrHol", (df2["DayOfWeek"]>=6).cast("int"))

This is a duplicate of: this

Reddspark
  • 6,934
  • 9
  • 47
  • 64