Python subset a data frame based on a variable name

Question

Hi I have a data frame for all stock data in my country market, the data look like this

Ticker  Date/Time   Open    High    Low Close   Volume
AAA     7/15/2010   19.581  20.347  18.429  18.698  174100.0
BBB     7/16/2010   19.002  19.002  17.855  17.855  109200.0
BBB     7/19/2010   19.002  19.002  17.777  17.777  104900.0
CCC     7/20/2010   18.429  18.429  17.084  17.354  328700.0
CCC     7/21/2010   17.354  17.431  16.895  17.316  75800.0

The column Ticker has the stock name, each row is the data in one specific date. I would like to write a loop code that create variable with variable name is the stock name, and the variable is the subset of the whole dataframe that contain data of this stock.

For example,

When I call variable BBB I will get this dataframe:

BBB

Ticker  Date/Time   Open    High    Low Close   Volume
BBB     7/16/2010   19.002  19.002  17.855  17.855  109200.0
BBB     7/19/2010   19.002  19.002  17.777  17.777  104900.0

Could you please advice how could I write this code

score 1 · Accepted Answer · edited May 23 '17 at 10:31

1

You can create dictionary of DataFrames where keys are names of them by groupby and dict comprehension:

dfs = {idx:x for idx, x in df.groupby('Ticker')}

print (dfs)
{'BBB':   Ticker  Date/Time    Open    High     Low   Close    Volume
1    BBB  7/16/2010  19.002  19.002  17.855  17.855  109200.0
2    BBB  7/19/2010  19.002  19.002  17.777  17.777  104900.0, 
'CCC':   Ticker  Date/Time    Open    High     Low   Close    Volume
3    CCC  7/20/2010  18.429  18.429  17.084  17.354  328700.0
4    CCC  7/21/2010  17.354  17.431  16.895  17.316   75800.0, 
'AAA':   Ticker  Date/Time    Open    High     Low   Close    Volume
0    AAA  7/15/2010  19.581  20.347  18.429  18.698  174100.0}

print (dfs['BBB'])
  Ticker  Date/Time    Open    High     Low   Close    Volume
1    BBB  7/16/2010  19.002  19.002  17.855  17.855  109200.0
2    BBB  7/19/2010  19.002  19.002  17.777  17.777  104900.0

Another solution:

dfs = {x:df[df['Ticker'] == x] for x in df['Ticker'].unique()}
print (dfs['BBB'])
  Ticker  Date/Time    Open    High     Low   Close    Volume
1    BBB  7/16/2010  19.002  19.002  17.855  17.855  109200.0
2    BBB  7/19/2010  19.002  19.002  17.777  17.777  104900.0

EDIT:

Thanks DSM for nice suggestion:

dfs = dict(list(df.groupby("Ticker")))

edited May 23 '17 at 10:31

Community

1
1

answered Apr 22 '17 at 13:51

jezrael

822,522
95
1,334
1,252

So how could I assign the variable name accordingly, I want to have a list of variable [AAA,BBB,CCC] in which: AAA = dfs['AAA'] BBB = dfs['BBB'] .... I am currently must do it manually but the dataframe have hundreds of symbol, could you advice – Anh Hoang Apr 22 '17 at 14:20
Hmmm, I think istead many variables better is one dict with all variables, so instead `ÀAA` use `AAA = dfs['AAA']`, instaed `BBB` use `AAA = dfs['BBB']`. What you need is not best practices in python. – jezrael Apr 22 '17 at 14:24
Maybe help also check [this](http://stackoverflow.com/a/1373185/2901002) - is possible use also `globals` and `locals`, but are you sure? In my opinion too many variables is not necessary if need only one dict. Maybe can you explain why do you need this? Thanks. – jezrael Apr 22 '17 at 14:36
actually after trying I think you are correct, I should not create too many variable like this. A dict is a much efficient option in this case :D – Anh Hoang Apr 22 '17 at 14:48
`dfs = dict(list(df.groupby("Ticker")))` is an alternative to the dictcomp. – DSM Apr 22 '17 at 15:18

Python subset a data frame based on a variable name

1 Answers1