0

I have a data frame that has repeating values in 2 columns and I only want to keep the highest value of each combination. For the following data frame:

df = pd.DataFrame(
np.array([['A', 'B ', 3], ['A', 'B', 6], ['C', 'D', 9],  ['C', 'D', 2], ['C', 'B', 4]]))
df

how would I get this dataframe as a result:

|A|B|6|
|C|D|9|
|C|B|4|
anky
  • 74,114
  • 11
  • 41
  • 70
UserX
  • 105
  • 1
  • 10

1 Answers1

1

Use groupby and aggregate max:

df.groupby([0,1], as_index=False)[2].max() 

Here's a post with a similar use case.

webb
  • 567
  • 3
  • 8
  • while that does show the numbers, I would also like it to show the 2 columns that have the letters. Can you please fix the code and I'll deem this answer as correct. – UserX May 11 '20 at 03:56
  • when I use this command, I get a three columned DataFrame with two columns containing letters and one column with the numbers. Could you describe the output of your code? Is yours resulting in a Series? Or a single columned DataFrame? – webb May 11 '20 at 04:00