How to name dataframes in a for-loop?

Question

I am attempting to name multiple dataframes using a variable in a for loop. Here is what I tried:

for name in DF['names'].unique():
    df_name = name + '_df'
    df_name = DF.loc[DF['names'] == str(name)

If one of the names in the DF['names'] column is 'George', the below command should work to print out the beginning of of of the dataframes that was generated.

George_df.head()

But I get an error message:

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Previous questions discuss ways to do this in a dictionary, but I am looking for a way to implement this for a dataframe.

Does this answer your question? [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables). ... [How can you dynamically create variables via a while loop?](https://stackoverflow.com/questions/5036700/how-can-you-dynamically-create-variables-via-a-while-loop) — wwii, May 11 '20 at 18:48
Probably the most common solution is to keep the objects in a dictionary. — wwii, May 11 '20 at 18:52
When posting a question about code that produces an Exception, always include the complete Traceback - copy and paste it then format it as code (select it and type `ctrl-k`) — wwii, May 11 '20 at 18:54

ansev · Accepted Answer · 2020-05-11T19:06:02.560

4

SetUp

df=pd.DataFrame({'names' : ['a','a','b','b'], 'values':list('1234')})

print(df)

  names values
0     a      1
1     a      2
2     b      3
3     b      4

Using globals and DataFrame.groupby

for name, group in df.groupby('names'):
    globals()[f'df_{name}'] = group
print(df_a)

  names values
0     a      1
1     a      2

print(df_b)

  names values
2     b      3
3     b      4

Although using globals is not recommended, I suggest you use a dictionary

dfs = dict(df.groupby('names').__iter__())
print(dfs['a'])

  names values
0     a      1
1     a      2

edited May 11 '20 at 19:06

answered May 11 '20 at 18:52

ansev

30,322
5
17
31

Minor semantics - the phrasing sounds like you're recommending `globals` over using a `dict`. I'm quite sure that's not the case since you don't sound *insane* to me. – r.ook May 11 '20 at 19:01
@ansev thank you for this explanation! Can I use the same command perform an operation within each of the dictionaries? Like: df_{name}['Num'] = np.arange(1, 521) – arkadiy May 11 '20 at 19:03
@r.ook Why is using globals insane? Can it damage something? – arkadiy May 11 '20 at 19:04
1

Using `globals` scope usually is frown upon as it messies up your namespace and makes it harder to work with the more complex your code is. If you can get away with just having one `dict` to manage all your variable names instead of say *100* names in your global scope, it makes life much easier. That, and you might unwittingly overwrite some existing names. – r.ook May 11 '20 at 19:06
what do you mean with `df_{name}['Num'] = np.arange(1, 521)` ? Do you want to edit the original dataframe or the new dataframes? this operation is really easy to do. It just depends on what your problem is, it will be better to adopt one solution or another so that the code is easy to read and efficient. – ansev May 11 '20 at 19:09
1

I recommend use a dictionary and I discourage the use of globals, I think I explained it wrong in my answer :) @r.ook – ansev May 11 '20 at 19:11
@arkadiy your additional question actually is part of why you wouldn't want to add individual names into `globals`. Instead of having an iterable `dict` object `for df in dfs.values(): df['Num'] = ...`, now you have to have *n* lines of `df_n['Num'] = ...` – r.ook May 11 '20 at 19:16
probably to assign values based on the group it is easier to map the series generated by the group and then use groupby ... df_{name}['Num'] = np.arange(1, 521) is similar to `dict(df.assign(num = df.groupby('names').cumcount().add(1)).groupby('names').__iter__())` – ansev May 11 '20 at 19:21

score 0 · Answer 2 · answered May 11 '20 at 18:53

0

I would recommend going with a dictionary structure like so:

test_dict = {}
test_dict["George"] = pd.DataFrame({"A":[1,2,3,4,5]})

In your case:

test_dict = {}

for name in DF['names'].unique():
    df_name = name + '_df'
    test_dict[df_name] = DF.loc[DF['names'] == str(name)]

But if you need to set new variables, this post will explain how to create them.

for name in DF['names'].unique():
    df_name = name + '_df'
    globals()[df_name] = DF.loc[DF['names'] == str(name)]

answered May 11 '20 at 18:53

webb

567
3
8

We shouldn't use Series.unique + boolean indexing, this is slow. We shoud use groupby here – ansev May 11 '20 at 18:57
@ansev I agree. I was trying to align it similar to his code, but yours is definitely more efficient. – webb May 11 '20 at 19:03

How to name dataframes in a for-loop?

2 Answers2