I have a spark dataframe as follows:
+-----+------+
|A | count|
+-----+------+
|dummy|23 |
|ABC |157 |
|abc |15 |
+-----+------+
I am trying to find the max value out of this column [157 in the example above] and this is what I have done:
max_value = df.agg({"count": "max"}).collect()[0][0]
I am new to spark programming. Although the solution above works, for large data [say a few million rows], I am unsure how efficient this solution is going to be [as it involves a reduction component]. Are there more efficient solutions available to get the max value out of a column?
PS: I have gone through many solutions on the internet [such as: https://stackoverflow.com/questions/33224740/best-way-to-get-the-max-value-in-a-spark-dataframe-column] and have not come across one dealing with the performance.
EDIT 1: The dataframe I am dealing with has multiple columns of large data.
EDIT 2: There are the transformations being performed on the data before the max value is to be fetched:
a) I get my input data from Google Cloud Platform (in Parquet).
b) This data is converted into a pyspark dataframe.
c) I then add a "count" column to this dataframe.
d) Then, from the "count" column, I would like to fetch the max value.