0

I would like to create a dataframe in a loop and after use these dataframe in a loop. I tried eval() function but it didn't work.

For example :

for i in range(5):
    df_i = df[(df.age == i)]

There I would like to create df_0,df_1 etc. And then concatenate these new dataframe after some calculations :

final_df = pd.concat(df_0,df_1)

for i in range(2:5):
    final_df = pd.concat(final_df, df_i)
Erfan
  • 40,971
  • 8
  • 66
  • 78
Babouch
  • 35
  • 2
  • 5

2 Answers2

1

You can create a dict of DataFrames x and have is as dict keys:

np.random.seed(42)
df = pd.DataFrame({'age': np.random.randint(0, 5, 20)})

x = {}
for i in range(5):
    x[i] = df[df['age']==i]

final = pd.concat(x.values())

Then you can refer to individual DataFrames as:

x[1]

Output:

    age
5     1
13    1
15    1

And concatenate all of them with:

pd.concat(x.values())

Output:

    age
18    0
5     1
13    1
15    1
2     2
6     2
...
perl
  • 9,826
  • 1
  • 10
  • 22
  • Thank you for your help. Is it possible to give a name to the dataframe depending on i ? In fact, I will create dataframe depending on double loop... – Babouch Mar 21 '19 at 10:19
  • 1
    Technically, yes, you can create variables with exec like `exec(f"df_{i} = df[df['age']==i]")`, but it's normally not recommended. See for example https://stackoverflow.com/questions/5036700/how-can-you-dynamically-create-variables-via-a-while-loop – perl Mar 21 '19 at 10:23
0

The way is weird and not recommended, but it can be done.

Answer

for i in range(5):
    exec("df_{i} = df[df['age']=={i}]")

def UDF(dfi):
    # do something in user-defined function

for i in range(5):
    exec("df_{i} = UDF(df_{i})")

final_df = pd.concat(df_0,df_1)

for i in range(2:5):
    final_df = pd.concat(final_df, df_i)

Better Way 1

Using a list or a dict to store the dataframe should be a better way since you can access each dataframe by an index or a key.

Since another answer shows the way using dict (@perl), I will show you the way using list.

def UDF(dfi):
    # do something in user-defined function

dfs = [df[df['age']==i] for i in range(i)]
final_df = pd.concat(map(UDF, dfs))

Better Way 2

Since you are using pandas.DataFrame, groupby function is a 'pandas' way to do what you want. (maybe, I guess, cause I don't know what you want to do. LOL)

def UDF(dfi):
    # do something in user-defined function

final_df = df.groupby('age').apply(UDF)

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

Liuhonwun
  • 26
  • 2