0

my dictionary has tuples of (item, field) as keys and (1,1) dataframes as values with columns equal tuples (item, field), row index as a single date (same for all items), for example:

Key
('AB US Equity', 'CHG_PCT_1D')
('AB US Equity', 'PX_LAST')
('IBM US Equity', 'CHG_PCT_1D')
('IBM US Equity', 'PX_LAST')

Value   
    ('AB US Equity', 'CHG_PCT_1D')
2/15/2019   0.5362

('AB US Equity', 'PX_LAST')
2/15/2019   30

here is the dic:

   In [100]: str(dic)
Out[100]: "{('IBM US Equity', 'PX_LAST'):             (IBM US Equity, PX_LAST)\n2019-02-15                    138.03, ('IBM US Equity', 'CHG_PCT_1D'):             (IBM US Equity, CHG_PCT_1D)\n2019-02-15                       1.1357, ('AB US Equity', 'PX_LAST'):             (AB US Equity, PX_LAST)\n2019-02-15                     30.0, ('AB US Equity', 'CHG_PCT_1D'):             (AB US Equity, CHG_PCT_1D)\n2019-02-15                      0.5362}"

the below code puts them horizontally:

 output=pandas.concat(dic.values(),axis=1)
 output.columns=pandas.MultiIndex.from_tuples(output.columns)
 output.columns.names=['Item','Field']

while having hundreds of items and a few fields I'd like to unpack it in the following vertical flat df:

   Value    CHG_PCT_1D  PX_LAST
AB US Equity    0.5362  30
IBM US Equity   1.1357  138.03

SOLUTION (mainly based on the answer below)

import pandas as pd

dic = {('IBM US Equity', 'PX_LAST'): '2019-02-15    138.03',
 ('IBM US Equity', 'CHG_PCT_1D'):           
 '2019-02-15                       1.1357',
 ('AB US Equity', 'PX_LAST'):             
 '2019-02-15                     30.0',
 ('AB US Equity', 'CHG_PCT_1D'):
 '2019-02-15                      0.5362'}

output = pd.DataFrame.from_dict(dic, orient = 'index')
# to series removing date
output = output[0].str.split().str[-1]
# unstack help, first example
index = pd.MultiIndex.from_tuples(output.index)
output = pd.Series(output.values, index=index).unstack()
gregV
  • 987
  • 9
  • 28

1 Answers1

1

Please take the SO tour and read about How to Ask. These will help you gather helpful responses more quickly.

IIUC, you are looking to unstack the innermost level of the MultiIndex. Here is the code I reproduced from your question:

dic = {('IBM US Equity', 'PX_LAST'): '2019-02-15                    138.03',
 ('IBM US Equity', 'CHG_PCT_1D'):           
 '2019-02-15                       1.1357',
 ('AB US Equity', 'PX_LAST'):             
 '2019-02-15                     30.0',
 ('AB US Equity', 'CHG_PCT_1D'):
 '2019-02-15                      0.5362'}

output = pd.DataFrame.from_dict(dic, orient = 'index')
output = output.reset_index()
output['a'] = output.apply(lambda row: row['index'][0], axis = 1)
output['b'] = output.apply(lambda row: row['index'][1], axis = 1)
output = output.drop(['index'], axis = 1)
output = output.groupby(['a', 'b']).sum().unstack()
output = output[0]
output['CHG_PCT_1D'] = output['CHG_PCT_1D'].str.split().str[-1]
output['PX_LAST'] = output['PX_LAST'].str.split().str[-1]

So when you run output.to_string(), you get:

b             CHG_PCT_1D PX_LAST
a                               
AB US Equity      0.5362    30.0
IBM US Equity     1.1357  138.03

Based on your question, I have no clue what you want the different columns or indices to be named. I also don't know what type of aggregate function to run on the grouped dataframe.

Cilantro Ditrek
  • 1,047
  • 1
  • 14
  • 26
  • Apologies, wanted output.columns = ['CHG_PCT_1D', 'PX_LAST'] and drop the date completely from the output. – gregV Feb 19 '19 at 21:51
  • I've updated my code to address your comment. The solution is based on code found here: https://stackoverflow.com/a/13053267/1301888 – Cilantro Ditrek Feb 20 '19 at 15:34