How to find maximum value of a column in python dataframe

Question

I have a data frame in pyspark. In this data frame I have column called id that is unique.

Now I want to find the maximum value of the column id in the data frame.

I have tried like below

df['id'].max()

But got below error

TypeError: 'Column' object is not callable

Please let me know how to find the maximum value of a column in data frame

In the answer by @Dadep the link gives the correct answer

This looks like a `spark` dataframe to me. Perhaps, you need to add a `spark` or `pyspark` tag or both to your question. — Abdou, May 11 '17 at 20:23
Please try to share a [mcve]. The best I can say is: it should work if all what you said is true. — MSeifert, May 11 '17 at 20:25

score 23 · Accepted Answer · edited May 23 '17 at 12:34

23

if you are using pandas .max() will work :

>>> df2=pd.DataFrame({'A':[1,5,0], 'B':[3, 5, 6]})
>>> df2['A'].max()
5

Else if it's a spark dataframe:

edited May 23 '17 at 12:34

Community

answered May 11 '17 at 20:26

Dadep

I had to run with df2.A.max() to make it work ... if its a help to anyone else – Edward Nov 05 '20 at 03:25

score 2 · Answer 2 · edited Sep 25 '19 at 14:32

2

I'm coming from scala, but I do believe that this is also applicable on python.

val max = df.select(max("id")).first()

but you have first import the following :

from pyspark.sql.functions import max

edited Sep 25 '19 at 14:32

AshuGG

answered May 17 '17 at 18:45

Haroun Mohammedi

score 1 · Answer 3 · edited Sep 22 '19 at 17:46

1

The following can be used in pyspark:

df.select(max("id")).show()

edited Sep 22 '19 at 17:46

MaartenDev

answered Sep 22 '19 at 17:44

piyush kaushal

score 1 · Answer 4 · answered Sep 22 '19 at 18:12

1

You can use the aggregate max as also mentioned in the pyspark documentation link below:

Code:

row1 = df1.agg({"id": "max"}).collect()[0]

answered Sep 22 '19 at 18:12

Deven

4 Answers4