1

All,

I have a dataframe that looks like this: df[['date','PRICE']]

df>>

date                   Price
                 PX_FIRST     PX_LAST

2018-03-05        1.710       -0.511
2018-03-06        1.725       -0.513
2018-03-07        1.745       -0.511
2018-03-08        1.750       -0.512

how can I get a dataframe similar to this? in other words how can I access PX_FIRST and PX_LAST. When I do df[['date','PRICE']] it dont manage to access individual columns.

  date           PX_FIRST     PX_LAST

2018-03-05        1.710       -0.511
2018-03-06        1.725       -0.513
2018-03-07        1.745       -0.511
2018-03-08        1.750       -0.512
jpp
  • 159,742
  • 34
  • 281
  • 339
SBad
  • 1,245
  • 5
  • 23
  • 36
  • Do you want to flatten the multiindex columns permanently, or just have a nicer view? – jpp Apr 04 '18 at 16:40
  • @jpp I want to flatten the multi index columns please. still trying the solution proposed by jezrael – SBad Apr 04 '18 at 16:47
  • Possible duplicate of [Python Pandas - How to flatten a hierarchical index in columns](https://stackoverflow.com/questions/14507794/python-pandas-how-to-flatten-a-hierarchical-index-in-columns) – jpp Apr 04 '18 at 16:47
  • @SBad - Give me as sec, I try explain for possible problem with it. I edit answer. – jezrael Apr 04 '18 at 16:48

3 Answers3

3

If need select columns under Price value of first level:

df = df['Price']

Or use DataFrame.xs:

df = df.xs('Price', axis=1)
print (df)
            PX_FIRST  PX_LAST
Date                         
2018-03-05     1.710   -0.511
2018-03-06     1.725   -0.513
2018-03-07     1.745   -0.511
2018-03-08     1.750   -0.512

If need remove top level of MultiIndex:

df.columns = df.columns.droplevel(0)

But be carefull if more columns with different first level (Price, Price1) and same values in second level:

#create sample data
df = pd.concat([df['Price'], df['Price'] * 0.4], keys=('Price','Price1'), axis=1)
print (df)
              Price           Price1        
           PX_FIRST PX_LAST PX_FIRST PX_LAST
Date                                        
2018-03-05    1.710  -0.511    0.684 -0.2044
2018-03-06    1.725  -0.513    0.690 -0.2052
2018-03-07    1.745  -0.511    0.698 -0.2044
2018-03-08    1.750  -0.512    0.700 -0.2048

Remove first level:

df.columns = df.columns.droplevel(0)
print (df)
            PX_FIRST  PX_LAST  PX_FIRST  PX_LAST
Date                                            
2018-03-05     1.710   -0.511     0.684  -0.2044
2018-03-06     1.725   -0.513     0.690  -0.2052
2018-03-07     1.745   -0.511     0.698  -0.2044
2018-03-08     1.750   -0.512     0.700  -0.2048

If select column PX_FIRST it return DataFrame, because duplicated columns names:

print (df['PX_FIRST'])
            PX_FIRST  PX_FIRST
Date                          
2018-03-05     1.710     0.684
2018-03-06     1.725     0.690
2018-03-07     1.745     0.698
2018-03-08     1.750     0.700

If need select by both levels, use tuples:

print (df[('Price', 'PX_FIRST')])
Date
2018-03-05    1.710
2018-03-06    1.725
2018-03-07    1.745
2018-03-08    1.750
Name: (Price, PX_FIRST), dtype: float64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
3

IIUC multiple index

df.loc[:,pd.IndexSlice['Price']]
Out[1108]: 
            PX_FIRST  PX_LAST
Date                         
2018-03-05     1.710   -0.511
2018-03-06     1.725   -0.513
2018-03-07     1.745   -0.511
2018-03-08     1.750   -0.512
BENY
  • 317,841
  • 20
  • 164
  • 234
0

@jezrael You are exactly right when I drop one level I end up with a duplicate column name and it is hard to distinguish columns unless I rename them?

The other challenge in your example below

            PX_FIRST  PX_FIRST
Date                          
2018-03-05     1.710     0.684
2018-03-06     1.725     0.690
2018-03-07     1.745     0.698
2018-03-08     1.750     0.700

is that column "Date", "PX_FIRST" and "PX_FIRST" are in different levels so I call df[['Date','PX_FIRST','PX_FIRST']] i get an error "...not in index"

Ideally, i d be looking to get

 Date          PX_FIRST  PX_LAST                              
2018-03-05     1.710     0.684
2018-03-06     1.725     0.690
2018-03-07     1.745     0.698
2018-03-08     1.750     0.700

All column names are on a similar level and have different names

Thanks

SBad
  • 1,245
  • 5
  • 23
  • 36
  • 1
    You need `df = df.reset_index()`, then is possible use `df[['Date','PX_FIRST','PX_FIRST']]` – jezrael Apr 05 '18 at 07:32
  • 1
    because first column is index and `Date` is index name, not column name – jezrael Apr 05 '18 at 07:33
  • 1
    @jezrael Thanks a lot. because the structure of my data is unusual. I had to drop 2 levels and reindex the date to finally get the desired results. Thank you again – SBad Apr 05 '18 at 08:04