-1

I want 'Experience' column data in single column like the below one (hire_date is not printed here.it is the column which is not being displayed below but I want to show it in my data):

 emp_id | emp_name | salary  |       Experience
--------+----------+---------+-------------------------
  68319 | KAYLING  | 6000.00 | 26 years 2 mons 12 days
  67858 | SCARLET  | 3100.00 | 20 years 9 mons 11 days  

Below is my query with result:

sqlContext.sql("SELECT emp_id,emp_name,hire_date,salary,datediff(current_date(),hire_date) as (Experience) FROM employees WHERE (salary/30)>100").show()  

Result:

+------+--------+----------+------+----------+
|emp_id|emp_name| hire_date|salary|Experience|
+------+--------+----------+------+----------+
| 68319| KAYLING|1991-11-18|6000.0|      9763|
| 67858| SCARLET|1997-04-19|3100.0|      7784|  

Schema:

|-- emp_id: long (nullable = true)
 |-- emp_name: string (nullable = true)
 |-- job_name: string (nullable = true)
 |-- manager_id: long (nullable = true)
 |-- hire_date: date (nullable = true)
 |-- salary: double (nullable = true)
 |-- commision: double (nullable = true)
 |-- dep_id: long (nullable = true)  

I tried creating sample UDF for getting difference in year in Scala but it keeps on giving me the error. Below is my code with error:

def getYearValue(value:java.util.Date):String= {
val year1:String=year(current_date()-year(hire_date);
year1;
}

Error:

<console>:26: error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: String
       val year1:String=year(current_date());  

So how do I modify my UDF code to get the expected format?

RushHour
  • 494
  • 6
  • 25
  • I tried using the post which was referred as having solution to my problem but using that I am still not able to get the exact solution. Using that I was getting days as mentioned above and every data is in a separate column instead of single column – RushHour Aug 15 '18 at 15:36
  • I haven't got solution for this yet :( – RushHour Aug 17 '18 at 05:16
  • I got an answer to this question but can't post it since people have marked my answer as duplicate one – RushHour Aug 17 '18 at 14:20

1 Answers1

1

That's not how you use org.apache.spark.sql.functions.*

  • current_date returns Column.
  • year takes Column and returns Column.

Additionally, Spark uses java.sql.Date not java.util.date. If you wanted to use UDF (UserDefinedFunction) you should

val getYearValue = udf((value: java.sql.Date) => {
   val year1: String = java.time.LocalDate.now.getYear.toString
   year1
})

And if you wanted to use year and current_date functions you should operate on Columns.

val ds: Dataset[Row] = ???

val current_year: Column = year(current_date())

ds.select(current_year)
  • Thanks for your response but its little bit complicated for me..Could you please provide me the UDF for getting difference between current year and the provided year ?? That would be helpful – RushHour Aug 12 '18 at 16:56
  • I am getting errors while executing your codes. for eg when I tried executing your UDF it gave me error: object time is not a member of package java val year1: String = java.time.LocalDate.now.getYear.toString – RushHour Aug 12 '18 at 17:04
  • I have also modified my question for getting the difference in years – RushHour Aug 12 '18 at 17:06