-2

The following dataframe will produce values 0 to 3.

df = DeltaTable.forPath(spark, '/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1').history().select(col("version"))

enter image description here

Can someone show me how to modify the dataframe such that it only provides the maximum value i.e 3?

I have tried

df.select("*").max("version")

And

df.max("version")

But no luck

Any thoughts?

Ronak Jain
  • 3,073
  • 1
  • 11
  • 17
Patterson
  • 1,927
  • 1
  • 19
  • 56
  • Try `df.select(max("version").alias("version")).show()` – arudsekaberne Mar 18 '23 at 17:39
  • Does this answer your question? [Best way to get the max value in a Spark dataframe column](https://stackoverflow.com/questions/33224740/best-way-to-get-the-max-value-in-a-spark-dataframe-column) – Lamanus Mar 19 '23 at 04:37

1 Answers1

1

Use Max function, This should work:

df.select(F.max("version").alias("max_version")).show()

or

df.agg(F.max("version").alias("max_version")).show()

Input:

+-------+
|version|
+-------+
|      0|
|      1|
|      3|
|      2|
+-------+

Output:

+-----------+
|max_version|
+-----------+
|          3|
+-----------+
Ronak Jain
  • 3,073
  • 1
  • 11
  • 17