0

I have the following expression,

val pageViews = spark.sql(
      s"""
         |SELECT
         |  proposal,
         |  MIN(timestamp) AS timestamp,
         |  MAX(page_view_after) AS page_view_after
         |FROM page_views
         |GROUP BY proposalId
         |""".stripMargin
    ).createOrReplaceTempView("page_views")

I want convert it into one that uses the Dataset API

val pageViews = pageViews.selectExpr("proposal", "MIN(timestamp) AS timestamp", "MAX(page_view_after) AS page_view_after").groupBy("proposal")

The problems is I can't call createOrReplaceTempView on this one - build fails.

My question is how do I convert the first one into the second one and create a TempView out of that?

2 Answers2

1

You can get rid of SQL expression al together by using Spark Sql's functions

import org.apache.spark.sql.functions._

as below

pageViews
      .groupBy("proposal")
      .agg(max("timestamp").as("timestamp"),max("page_view_after").as("page_view_after"))
`

QuickSilver
  • 3,915
  • 2
  • 13
  • 29
  • can you help and suggest how to handle this https://stackoverflow.com/questions/62036791/while-writing-to-hdfs-path-getting-error-java-io-ioexception-failed-to-rename – BdEngineer May 27 '20 at 06:44
0

Considering you have a dataframe available with name pageViews -

Use -

 pageViews
      .groupBy("proposal")
      .agg(expr("min(timestamp) AS timestamp"), expr("max(page_view_after) AS page_view_after"))
Som
  • 6,193
  • 1
  • 11
  • 22
  • can you help and suggest how to handle this https://stackoverflow.com/questions/62036791/while-writing-to-hdfs-path-getting-error-java-io-ioexception-failed-to-rename – BdEngineer May 27 '20 at 06:44