-1

I have a data frame with column (A, B) where column B is free test which I am converting to type (NOT_FOUND, TOO_LOW_PURCHASE_COUNT and etc) to aggregate better. I created a switch case of all possible patter and their respective type but it is not working.

def getType(x: String): String = x match {
    case "Item % not found %" =>"NOT_FOUND"
    case "%purchase count % is too low %" =>"TOO_LOW_PURCHASE_COUNT"
    case _ => "Unknown"
}

getType("Item 75gb not found") 

val newdf = df.withColumn("updatedType",getType(col("raw_type"))) 

This gives me "Unknown". Can some one tell me how to do switch case for like operator ?

user3407267
  • 1,524
  • 9
  • 30
  • 57
  • 1
    there is no "like operator" in scala. Sounds like you need to read up on [regular expressions](https://www.scala-lang.org/api/2.12.3/scala/util/matching/Regex.html) – Dima Oct 04 '18 at 21:28

2 Answers2

1

Use when and like

import org.apache.spark.sql.functions.when

val df = Seq(
  "Item foo not found",  "Foo purchase count 1 is too low ", "#!@"
).toDF("raw_type")

val newdf = df.withColumn(
  "updatedType",
  when($"raw_type" like "Item % not found%", "NOT_FOUND")
    .when($"raw_type" like "%purchase count % is too low%", "TOO_LOW_PURCHASE_COUNT")
    .otherwise("Unknown")
)

Result:

newdf.show
// +--------------------+--------------------+
// |            raw_type|         updatedType|
// +--------------------+--------------------+
// |  Item foo not found|           NOT_FOUND|
// |Foo purchase coun...|TOO_LOW_PURCHASE_...|
// |                 #!@|             Unknown|
// +--------------------+--------------------+

Reference:

0

SQL symbol "%" in regexp world can be replaced with ".*". UDF can be created for match value to patterns:

val originalSqlLikePatternMap = Map("Item % not found%" -> "NOT_FOUND",
  // 20 other patterns here
  "%purchase count % is too low %" -> "TOO_LOW_PURCHASE_COUNT")
val javaPatternMap = originalSqlLikePatternMap.map(v => v._1.replaceAll("%", ".*") -> v._2)

val df = Seq(
  "Item foo not found ", "Foo purchase count 1 is too low ", "#!@"
).toDF("raw_type")

val converter = (value: String) => javaPatternMap.find(v => value.matches(v._1)).map(_._2).getOrElse("Unknown")
val converterUDF = udf(converter)

val result = df.withColumn("updatedType", converterUDF($"raw_type"))
result.show(false)

Output:

+--------------------------------+----------------------+
|raw_type                        |updatedType           |
+--------------------------------+----------------------+
|Item foo not found              |NOT_FOUND             |
|Foo purchase count 1 is too low |TOO_LOW_PURCHASE_COUNT|
|#!@                             |Unknown               |
+--------------------------------+----------------------+
pasha701
  • 6,831
  • 1
  • 15
  • 22