Removing observation with min value in a column out of the dataframe one by one using loop in Python

Question

I have a dataframe "data" that looks like this:

f1	f2	f3
11	34	a
14	10	a
20	12	a
15	19	b
19	29	b
29	30	b

I want to find the minimum value of f2 IF f3 is a. I don't want to find min value of f2 for f3 = a or f3 = b, just when f3 = a. And then I want to remove the observation that is associated with that min value in f2 in the dataframe. So I had this code:

a_part = data[data['f3'] == 'a'
min1 = a_part['f2'].min()
min1 = data['f2'] = min1
data_new_1 = pd.dataframe(data.loc[~min1])

Which works well. And now my dataframe looks like:

f1	f2	f3
11	34	a
20	12	a
15	19	b
19	29	b
29	30	b

However, I want to remove the min value of f2 when f3 = a and the associated observation one by one by using a loop, and having a new dataframe each time. So essentially data_new_2 looks like:

f1	f2	f3
11	34	a
15	19	b
19	29	b
29	30	b

Until there's only b left in f3. I tried to do a loop for it:

for i in range(1,6):
    IN = data_new_i[['f3'] == 'a']
    min1 = a_part['f2'].min() 
    min1 = data_new_i['f2'] == min1
    vars()[data_new_i++] = pd.DataFrame(data.loc[~min1])

And this doesn't work. I'm very unfamiliar with the way Python treats new dataframe name with loop index. I think I have to use dict to put new dataframe in, but I don't know how I can extract a column of a dataframe from a dict, and how I can save the new dataframe into the dict. Can someone help me please?

jezrael · Accepted Answer · 2021-09-17T05:46:30.620

In my solution ouput is list of DataFrames.

If there are always unique values in column f2 use loop by index values of sorted column and drop row by minimal value:

out = []
data1 = data.sort_values('f2')
for i in data1.loc[data1['f3'] == 'a', 'f2'].index:
    data = data.drop(i)
    out.append(data)
print (out)
[   f1  f2 f3
0  11  34  a
2  20  12  a
3  15  19  b
4  19  29  b
5  29  30  b,    f1  f2 f3
0  11  34  a
3  15  19  b
4  19  29  b
5  29  30  b,    f1  f2 f3
3  15  19  b
4  19  29  b
5  29  30  b]

If possible duplicates and need remove all dupes, like here in first loop all rows with f2=10 use:

print (data)
   f1  f2 f3
0  11  10  a
1  14  10  a
2  20  12  a
3  15  19  b
4  19  29  b
5  29  30  b

out = []
data1 = data.sort_values('f2')
for i, g in data1.groupby(data1.loc[data1['f3'] == 'a', 'f2']):
    data = data.drop(g.index)
    out.append(data)
print (out)
[   f1  f2 f3
2  20  12  a
3  15  19  b
4  19  29  b
5  29  30  b,    f1  f2 f3
3  15  19  b
4  19  29  b
5  29  30  b]

It is not recommended, but possible create DataFrames by groups:

data1 = data.sort_values('f2')
for j, (i, g) in enumerate(data1.groupby(data1.loc[data1['f3'] == 'a', 'f2']), 1):
    data = data.drop(g.index)
    globals()[f'data_new_{j}'] = data
print (data_new_1)
   f1  f2 f3
2  20  12  a
3  15  19  b
4  19  29  b
5  29  30  b

print (data_new_2)
   f1  f2 f3
3  15  19  b
4  19  29  b
5  29  30  b

May I ask what do the i and g represent? Does i represent the location where f3 = a, and g represents each block of unique values of f2? — shuu, Sep 17 '21 at 16:18
@shuu i is here name of group, values used for grouping here 10 and 12 and g is group. — jezrael, Sep 17 '21 at 20:12

Removing observation with min value in a column out of the dataframe one by one using loop in Python

1 Answers1