Suppose I have a dataframe of the form
X Y
0 1 1
1 1 2
2 1 3
3 2 4
4 2 5
5 2 6
6 3 7
7 3 8
8 3 9
I want my new DataFrame to be such that for every unique value of X, i.e. 1, 2, and 3, I isolate the row where Y is at its minimum. So I want my final DataFrame to look like:
X Y
0 1 1
1 2 4
2 3 7
I am familiar with how to get each unique value of X. It would be df['X'].unique()
. So my naive approach was initially to simply do:
X_list = []
for i in df['X'].unique():
i_df = df.loc[df['X'] == i]
X_list.append(i_df.loc[i_df['Y'].idxmin()])
new_df = pd.DataFrame(X_list)
However, this brings up problems because the indices get all messed up if you try to just merge these isolated rows together by making a new DataFrame out of them. Namely I get a KeyError suggesting a row of such an index does not exist if I try to access these rows in the new DataFrame by their original index from the old DataFrame. I initially tried to solve this with the reset_index
function but that didn't work either.
So I would like to know if there is some clean way to query this from the original DataFrame or if there is an augmentation to my solution that makes it work like you would expect from a DataFrame without weird indexing problems. I suspect there is maybe some way to do this using the groupby
function, but I can't really think of how.