Full disclosure - I am a newbie so please be patient with me. I have a data file. I need to sort by the zip_code column first - then I need to calculate the highest score per zip code.
Fname Lname Area Score
Amy Doe 3 245
Jon Doe 1 310
Jane Doe 2 724
Brian Doe 1 840
Gary Doe 3 632
Jen Doe 2 854
Jim Doe 3 132
Rick Doe 1 445
My code:
import pandas as pd
from pandas import DataFrame, pandas as pd
file = pd.read_csv('test.dat',delimiter=',' )
df = DataFrame(file, columns=['Fname','Lname','Score','zip_code'])
df.sort_values(by=['Area','Score'], inplace=True)
print(df)
The desired output would be something to this effect:
Fname Lname Area Score
Brian Doe 1 840-->Winner!
Rick Doe 1 445
Jon Doe 1 310
Jen Doe 2 854-->Winner!
Jane Doe 2 132
Gary Doe 3 632-->Winner!
Jim Doe 3 132
Rick Doe 3 445
This is what I get it:
Fname Lname Score Area
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
I have not figured out how to sum up the column yet. Can you please tell me what I am doing wrong?