Create dynamic dataframe name by splitting a larger dataframe

Question

I have a large csv and would like to split it in e.g 4 parts with generated names in the loop e.g sub0,sub1,sub2,sub3. I can split routinely as following:

df=pd.DataFrame(np.random.randint(0,100,size=(20, 3)), columns=list('ABC'))

for i,chunk in enumerate(np.array_split(df, 4)):
    print(chunk.head(2)) #just to check
    print(chunk.tail(1)) #just to check

    sub+str(i)=chunk.copy() # this gives error

But with the assigning names in the last line, I get the expected error: SyntaxError: can't assign to operator.

Q: how to get sub0,..,sub3 by copying each chunk in the loop? Thank you!

Possible duplicate of [Python Pandas Dynamically Create a Dataframe](https://stackoverflow.com/questions/47109931/python-pandas-dynamically-create-a-dataframe) — yatu, Mar 05 '19 at 11:13
best to create a dict with the names as keys: `chunks = {f'{sub}{i}':chunk for i, chunk in enumerate(np.array_split(df, 10))}` — Chris Adams, Mar 05 '19 at 11:18
What is the expected output? 10 separate DataFrames? Adding the expected output to the question would make it a bit easier to answer. — John Sloper, Mar 05 '19 at 11:41
@ChrisA could you check my edit please? I cant get the output with your line even though I know it is almost there — physiker, Mar 05 '19 at 12:42

score 1 · Answer 1 · answered Mar 05 '19 at 11:18

1

Why would you want to create variables in a loop?

They are unnecessary: You can store everything in lists or any other type of collection
They are hard to create and reuse: You have to use exec or globals()

Using a list is much easier:

subs = []
for chunk in np.array_split(df, 10):
        print(chunk.head(2)) #just to check
        print(chunk.tail(1)) #just to check
        subs.append(chuck.copy())

answered Mar 05 '19 at 11:18

Albert Alonso

656
1
6
21

Thanks @Albert, your comments are certainly valid. However I would need to have dataframe rather than list. I agree with you that my approach is not optimal, that's why I would like to know a better solution which gives me dataframes because I need to use them in several other functions for processing. – physiker Mar 05 '19 at 11:30
1

You can still access the data frame inside a list. You lose no functionality of its property just referencing changes: `my_list[0]`. Even use a dictionary:`my_dict['myfirstdf']`. – Parfait Mar 05 '19 at 13:51

Chris Adams · Accepted Answer · 2019-03-05T13:13:53.107

1

Best way is to create a dict with the dynamic names as keys:

chunks = {f'{sub}{i}':chunk for i, chunk in enumerate(np.array_split(df, 10))}

If you absolutely insist on creating the frames as individual variables, then you could assign them to the globals() dictionary, but this method is NOT advised:

for i, chunk in enumerate(np.array_split(df, 10)):
    globals()['{}{}'.format(sub, i)] = chunk

edited Mar 05 '19 at 13:13

answered Mar 05 '19 at 12:54

Chris Adams

18,389
4
22
39

Thanks. How do I access now all the new dataframes? when I check by %who DataFrame, I dont see any. perhaps also {} in {sub}, are typo? – physiker Mar 05 '19 at 13:01
@physiker I've updated to use .`format` instead of 'f-strings' in case you are using an older version of python – Chris Adams Mar 05 '19 at 13:15
1

Thank you, your first approach works better and I agree, it is more correct way. – physiker Mar 05 '19 at 16:36

Create dynamic dataframe name by splitting a larger dataframe

2 Answers2