2

How to pass variable arguments to the Cube function in spark sql and also agg function of the cube?

I have a list of columns, and I want to find the cube function on the columns and also aggerations function.

For example:

val columnsInsideCube = List("data", "product","country")
val aggColumns = List("revenue")

I want something like this:

dataFrame.cube(columns:String*).agg(aggcolumns:String*)

This is not like passing scala array to the Cube. Cube is predefined class in the datafram.we have to send it in a proper manner.

Devndra
  • 41
  • 4
  • 1
    I formatted your text and fixed your grammar, because I love you. Next time do it from yourself, thank you. And don't forget: "I" is always capital case on English! – peterh Jun 14 '16 at 15:12
  • 2
    Possible duplicate of [How pass scala Array into scala vararg method?](http://stackoverflow.com/questions/31064753/how-pass-scala-array-into-scala-vararg-method) – zero323 Jun 15 '16 at 01:30

1 Answers1

0

You could use

Spark (new in version 1.4)

import pyspark.sql.DataFrame.cube
df.cube("name", df.age).count().orderBy("name", "age").show()

see also How to use "cube" only for specific fields on Spark dataframe?


or HiveSQL

GROUP BY a, b, c WITH CUBE

or which is equivalent to

GROUP BY a, b, c GROUPING SETS ( (a, b, c), (a, b), (b, c), (a, c), (a), (b), (c), ( ))

https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup#space-menu-link-content


or you could use other libraries like

import com.activeviam.sparkube._
InLaw
  • 2,537
  • 2
  • 21
  • 33