0

I am trying to create classes in a new column, based on existing words in another column. For that, I need to include multiple .contains() conditions. But none of the one I tried work.

def classes_creation(data):
    df = data.withColumn("classes", when(data.where(F.col("MISP_RFW_Title").like('galleys') | F.col("MISP_RFW_Title").like('coffee')),"galleys") ).otherwise(lit(na))
    return df
# RETURNS ERROR
def classes_creation(data):
     df = data.withColumn("classes", when(col("MISP_RFW_Title").contains("galleys").contains("word"), 'galleys').otherwise(lit(na))
     return df
# RETURNS COLUMN OF NA ONLY
def classes_creation(data):
     df = data.withColumn("classes", when(col("MISP_RFW_Title").contains("galleys" | "word"), 'galleys').otherwise(lit(na))
     return df
# RETURNS COLUMN OF NA ONLY
J. Perez
  • 117
  • 1
  • 2
  • 7

1 Answers1

0

If I understood your requirements correctly, you can use regex for matching with rlike

data.withColumn("classes", when(col("MISP_RFW_Title").rlike("galleys|word"), 'galleys').otherwise('a'))

or maybe if you have different columns, you can use something like this

data.withColumn("classes", when((col("MISP_RFW_Title").contains("galleys")|col("MISP_RFW_Title").contains("word")), 'galleys').otherwise('a'))
rock321987
  • 10,942
  • 1
  • 30
  • 43
  • I got the following error when implementing your code : `in when raise TypeError("condition should be a Column") TypeError: condition should be a Column` – J. Perez Nov 07 '19 at 13:41
  • @J.Perez works fine for me.. what is the Spark Version you are using? can you share some of the data? – rock321987 Nov 07 '19 at 14:44