1

May be a really silly question, but for:

val ds3 = ds.groupBy($"ip")
            .avg("humidity") 

it is not clear how for a dataset, not dataframe, how I can rename the column like using alias on-the-fly. I tried a few things but to no avail. No errors when trying, but no effect.

I would like "avg_humidity" as col name.

Extending the question, what if I issue:

val ds3 = ds.groupBy($"ip")
            .avg() 

How to handle that?

thebluephantom
  • 16,458
  • 8
  • 40
  • 83

2 Answers2

1

avg does not provide an alias func you might need an extra withColumnRenamed

val ds3 = ds.groupBy($"ip")
  .avg("humidity")
  .withColumnRenamed("avg(humidity)","avg_humidity")

instead you can use .agg(avg("humidity").as("avg_humidity"))

val ds3 = ds.groupBy($"ip").agg(avg("humidity").as("avg_humidity"))
QuickSilver
  • 3,915
  • 2
  • 13
  • 29
  • You meant if you execute `ds.groupBy($"ip").avg() ` ? @thebluephantom or you meant and exception? – QuickSilver Jun 15 '20 at 12:37
  • When using the functions I note that we get dataframes again - I saw that a long time ago and in fact that does not appear to have changed. Disappointing. – thebluephantom Jun 15 '20 at 12:37
  • ohh now I got it @thebluephantom – QuickSilver Jun 15 '20 at 12:41
  • I am studying for a certification and thought I would check the DF vs DS. I see many issues with DS still. Just select and such works sort of, but AGGR mean a DF gotten, yes I know interchangeable, but plenty of work to do. Anyway I can change all the cols myself as I see there are limits top DSs that you confirm. Cheers – thebluephantom Jun 15 '20 at 12:45
1

groupBy(cols: Column*) returns a RelationalGroupedDataset.

The return type for avg(colNames: String*) on it is a DataFrame, so by using as(alias: String) you're simply assigning alias to a new DataFrame, not to a column(s).

SO discussion about renaming columns in a DataFrame is here.

mazaneicha
  • 8,794
  • 4
  • 33
  • 52