I have an input dataframe that looks like this:
col0 col1 name
0 g0 100 n1
1 g0 120 n1
2 g1 40 n3
3 g1 30 n4
4 g2 10 n5
5 g2 100 n5
6 g2 250 n5
I want to obtain a new dataframe (or the same, it doesn't matter) where col1
and col2
contains respectively the minimum and maximum value for each tuple (col0, name)
. I am able to perform this with the following snippet:
import pandas as pd
data = {'col0': ["g0", "g0", "g1", "g1", "g2", "g2", "g2"], 'col1': [100, 120, 40, 30, 10, 100, 250], 'name': ["n1", "n1", "n3", "n4", "n5", "n5", "n5"]}
df = pd.DataFrame(data=data)
df_res = pd.DataFrame()
names = df["name"].unique()
for name in names:
literals = df[df["name"] == name]["col0"].unique()
for literal in literals:
min_val = df[(df["name"] == name) &
(df["col0"] == literal)]["col1"].min()
max_val = df[(df["name"] == name) &
(df["col0"] == literal)]["col1"].min()
elem = {"col0": literal, "col1": min_val, "col2": max_val,
"name": name}
df_res = df_res.append(elem, ignore_index=True)
Which gives the desired output stored on df_res
, which is as follows:
col0 col1 col2 name
0 g0 100.0 100.0 n1
1 g1 40.0 40.0 n3
2 g1 30.0 30.0 n4
3 g2 10.0 10.0 n5
The thing is that I have to make this transformation with a dataframe with 95K registers and my approach is extremely slow (it takes several minutes no finish).
Therefore, my question is: is there any way to optimize this operation?