4

In this data frame I am finding total salary from each group. In Oracle I'd use this code

select job_id,sum(salary) as "Total" from hr.employees group by job_id;

In Spark SQL tried the same, I am facing two issues

empData.groupBy($"job_id").sum("salary").alias("Total").show()
  1. The alias total is not displaying instead it is showing "sum(salary)" column
  2. I could not use $ (I think Scala SQL syntax). Getting compilation issue

     empData.groupBy($"job_id").sum($"salary").alias("Total").show()
    

Any idea?

10465355
  • 4,481
  • 2
  • 20
  • 44
Learn Hadoop
  • 2,760
  • 8
  • 28
  • 60
  • Possible duplicate of [Column alias after groupBy in pyspark](https://stackoverflow.com/questions/33516490/column-alias-after-groupby-in-pyspark) – 10465355 Oct 11 '18 at 10:28

1 Answers1

9

Use Aggregate function .agg() if you want to provide alias name. This accepts scala syntax ($" ")

empData.groupBy($"job_id").agg(sum($"salary") as "Total").show()

If you dont want to use .agg(), alias name can be also be provided using .select():

empData.groupBy($"job_id").sum("salary").select($"job_id", $"sum(salary)".alias("Total")).show()
vdep
  • 3,541
  • 4
  • 28
  • 54