-1

I am not understanding what is going wrong in here.

The ERROR I'm having is the And(&&) operator is not working, everything is being directed to else. If I dont use the And(&&) operator only then some of the if condition works. Please look at the column age and ageGroup below, compare them with the UDF declaration. How age 6 and 7 are adults and 20 is a kid?

Output
enter image description here

Here is my code:

All Spark imports and initializations

import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.sql.types._
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions.udf

case class Person(name: String, address:String, state, age:Int, phone:Int, order:String)

val df = Seq(
("adnan", "migi way", "texas", 10, 333, "AX-1"),
("dim", "gigi way", "utah", 6,222, "AX-2"),
("alvee", "sigi way", "utah", 9,222, "AX-2"),
("john", "higi way", "georgia", 20,111, "AX- 3")).toDF("name","address","state","age","phone", "order")


val df1 = datafile.map(_.split("\\|")).map(attr => Person(attr(0).toString, attr(1).toString, attr(2).toString, attr(3).toInt, attr(4).toInt, attr(5).toString)).toDF()

UDF Code below

def ageFilter = udf((age: Int) => {
  if (age >= 2 && age <= 9) "bacha"
   if (age >= 10 ) "kiddo"
    else "adult"
  })

Calling the UDF

val one_hh_ages = df1.withColumn("ageGroup", ageFilter($"age"))

This is where I took help from: Apache Spark, add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame

Community
  • 1
  • 1
xem
  • 125
  • 5
  • 17

1 Answers1

0

The problem is that the first condition in your UDF has no effect because the function is not returning at this point but continues with the next if-statement.

You can rewrite it like this using an else if

def ageFilter = udf((age: Int) => {
  if (age >= 2 && age <= 9) "bacha"
  else if (age >= 10 ) "kiddo"
  else "adult"
})

or this with pattern matching:

def ageFilter = udf((age: Int) => {
  age match {
    case age if age >=2 && age <=9  => "bacha"
    case age if age >=10            => "kiddo"
    case default                    => "adult"
  }
})

But you should really check your logical conditions (older than 10 is kiddo? younger than 2 is adult?)

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
  • its just for testing. silly me, i figured out last night but couldn't delete the question. i forgot if-else statements huh !!. I need to delete this question, any suggestion how? – xem Nov 09 '16 at 15:27