1

What is wrong with this usage of first? I want to take the first row for each id in my dataframe, however it returns an error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Could not resolve window function 'first_value'. Note that, using window functions currently requires a HiveContext;

The code is:

WindowSpec window = Window.partitionBy(df.col("id"));
df= df.select(first(df.col("*")).over(window));

I am using a HiveContext.

lte__
  • 7,175
  • 25
  • 74
  • 131
  • Can you - for tests - try following code: `WindowSpec window = Window.partitionBy(df.col("id")); df= df.select(first(df.col("id")).over(window));` It's possible that window function cannot be used with * – T. Gawęda Sep 09 '16 at 10:27

1 Answers1

-1

Did you read/create your spark dataframe with SparkContext or HiveContext? Window functions require HiveContext to be used

More detail here: Window function is not working on Pyspark sqlcontext

Community
  • 1
  • 1
phi
  • 10,572
  • 3
  • 21
  • 30