1

I want to create multiple dataframes of names that the same as values in one of the column. I would like this code to work like that:

import pandas as pd

data=pd.read_csv('athlete_events.csv')


Sports = data.Sport.unique()

for S in Sports:
    name=str(S)
    name=data.loc[data['Sport']==S]
harvpan
  • 8,571
  • 2
  • 18
  • 36
Ammon
  • 83
  • 1
  • 10
  • What do you mean by names of dataframes? – xyzjayne Jul 31 '18 at 18:39
  • 2
    `"I would like this code to work like that:"`, like what? Can you show input and expected output please? Refer to [MCVE] – harvpan Jul 31 '18 at 18:40
  • Do you mean that you would like to create an unique dataframe for each unique value in the `Sport` column and you would like the variable name for each dataframe to be the same as the `Sport` value? – johnchase Jul 31 '18 at 18:43
  • johnchase Yes, exacly this I wont to have. I know I can iterate dataframe by different type of function but I wont to reorganize it and split to be easier for me to analyse it – Ammon Jul 31 '18 at 18:51

2 Answers2

5

Use a dictionary for organizing your dataframes, and groupby to split them. You can iterate through your groupby object with a dict comprehension.

Example:

>>> data
      Sport  random_data
0    soccer            0
1    soccer            3
2  football            1
3  football            1
4    soccer            4

frames = {i:dat for i, dat in data.groupby('Sport')}

You can then access your frames as you would any other dictionary value:

>>> frames['soccer']
    Sport  random_data
0  soccer            0
1  soccer            3
4  soccer            4

>>> frames['football']
      Sport  random_data
2  football            1
3  football            1
sacuL
  • 49,704
  • 8
  • 81
  • 106
0

You can do this by modifying globals() but that's not really adviseable.

for S in Sports:
    globals()[str(S)] = data.loc[data['Sport']==S]    

Below is a self-contained example:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'sport':['football', 'football', 'tennis'],
                           'value':[1, 2, 3]})

In [3]: df
Out[3]: 
      sport  value
0  football      1
1  football      2
2    tennis      3

In [4]: for name in df.sport.unique():
    ...:     globals()[name] = df.loc[df.sport == name]
    ...:     

In [4]: football
Out[4]: 
      sport  value
0  football      1
1  football      2

While this is a direct answer to your question, I would recommend sacul's answer, dictionaries are meant for this (i.e. storing keys and values) and variable names inserted via globals() are usually not a good idea to begin with.

Imagine someone else or yourself in the future reading your code - all of a sudden you are using football like a pd.DataFrame which you have never explicitly defined before - how are you supposed to know what is going on?

tobsecret
  • 2,442
  • 15
  • 26