I have a dataframe with the following 3 columns
ID Department Number
---------------------------------
2324 Art 4
2324 Math 1
2324 Art 3
2400 Science 2
2593 Tech 5
2593 Math 1
I'm trying to filter first by ID, then by Number. As you can see, some IDs repeat. I want to first find the IDs that repeat, then choose the highest corresponding Number.
For instance, as you can see the ID 2324 repeats 3 times. (4, 1, 3) are the Numbers that correspond with each entry of 2324. Since 4 is the bigger number, I choose the entry with 4. I want to filter the dataframe to get this output:
ID Department Number
---------------------------------
2324 Art 4
2400 Science 2
2593 Tech 5
This is my code so far :
for previous, current in zip(df['ID'], df['ID'][1:]):
for i, j in zip(df['Number'], df['Number'][1:]):
if previous == current:
if j> i:
However, I don't know what I should add next to correctly print(previous, j)
.
If I add print(previous, j)
to the nested loop, I get repeated entries.
For instance, if my code was
for previous, current in zip(df['ID'], df['ID'][1:]):
for i, j in zip(df['Number'], df['Number'][1:]):
if previous == current:
if j> i:
print(previous, j)
it outputs
2324 4
2324 4
2324 4
2324 4
2324 4
2324 4
2593 5
2593 5
2593 5
2593 5
I want it to output:
2324 4
2593 5
I also want it to include the ID that was not repeated, so 2400 2
in this case.
Additionally, I don't know how to append the correct Department name to the nested loop.
Thank you to anyone who took the time to read this and help me out. I really appreciate it.