How to access a column within a column using pandas

Question

All,

I have a dataframe that looks like this: df[['date','PRICE']]

df>>

date                   Price
                 PX_FIRST     PX_LAST

2018-03-05        1.710       -0.511
2018-03-06        1.725       -0.513
2018-03-07        1.745       -0.511
2018-03-08        1.750       -0.512

how can I get a dataframe similar to this? in other words how can I access PX_FIRST and PX_LAST. When I do df[['date','PRICE']] it dont manage to access individual columns.

  date           PX_FIRST     PX_LAST

2018-03-05        1.710       -0.511
2018-03-06        1.725       -0.513
2018-03-07        1.745       -0.511
2018-03-08        1.750       -0.512

Do you want to flatten the multiindex columns permanently, or just have a nicer view? — jpp, Apr 04 '18 at 16:40
@jpp I want to flatten the multi index columns please. still trying the solution proposed by jezrael — SBad, Apr 04 '18 at 16:47
Possible duplicate of [Python Pandas - How to flatten a hierarchical index in columns](https://stackoverflow.com/questions/14507794/python-pandas-how-to-flatten-a-hierarchical-index-in-columns) — jpp, Apr 04 '18 at 16:47
@SBad - Give me as sec, I try explain for possible problem with it. I edit answer. — jezrael, Apr 04 '18 at 16:48

jezrael · Accepted Answer · 2018-04-04T16:55:48.183

If need select columns under Price value of first level:

df = df['Price']

Or use DataFrame.xs:

df = df.xs('Price', axis=1)
print (df)
            PX_FIRST  PX_LAST
Date                         
2018-03-05     1.710   -0.511
2018-03-06     1.725   -0.513
2018-03-07     1.745   -0.511
2018-03-08     1.750   -0.512

If need remove top level of MultiIndex:

df.columns = df.columns.droplevel(0)

But be carefull if more columns with different first level (Price, Price1) and same values in second level:

#create sample data
df = pd.concat([df['Price'], df['Price'] * 0.4], keys=('Price','Price1'), axis=1)
print (df)
              Price           Price1        
           PX_FIRST PX_LAST PX_FIRST PX_LAST
Date                                        
2018-03-05    1.710  -0.511    0.684 -0.2044
2018-03-06    1.725  -0.513    0.690 -0.2052
2018-03-07    1.745  -0.511    0.698 -0.2044
2018-03-08    1.750  -0.512    0.700 -0.2048

Remove first level:

df.columns = df.columns.droplevel(0)
print (df)
            PX_FIRST  PX_LAST  PX_FIRST  PX_LAST
Date                                            
2018-03-05     1.710   -0.511     0.684  -0.2044
2018-03-06     1.725   -0.513     0.690  -0.2052
2018-03-07     1.745   -0.511     0.698  -0.2044
2018-03-08     1.750   -0.512     0.700  -0.2048

If select column PX_FIRST it return DataFrame, because duplicated columns names:

print (df['PX_FIRST'])
            PX_FIRST  PX_FIRST
Date                          
2018-03-05     1.710     0.684
2018-03-06     1.725     0.690
2018-03-07     1.745     0.698
2018-03-08     1.750     0.700

If need select by both levels, use tuples:

print (df[('Price', 'PX_FIRST')])
Date
2018-03-05    1.710
2018-03-06    1.725
2018-03-07    1.745
2018-03-08    1.750
Name: (Price, PX_FIRST), dtype: float64

Nice one sir, learn new thing , Never thought xs. can be used like this — BENY, Apr 04 '18 at 16:41

score 3 · Answer 2 · answered Apr 04 '18 at 16:38

3

IIUC multiple index

df.loc[:,pd.IndexSlice['Price']]
Out[1108]: 
            PX_FIRST  PX_LAST
Date                         
2018-03-05     1.710   -0.511
2018-03-06     1.725   -0.513
2018-03-07     1.745   -0.511
2018-03-08     1.750   -0.512

answered Apr 04 '18 at 16:38

BENY

317,841
20
164
234

score 0 · Answer 3 · answered Apr 05 '18 at 07:28

0

@jezrael You are exactly right when I drop one level I end up with a duplicate column name and it is hard to distinguish columns unless I rename them?

The other challenge in your example below

            PX_FIRST  PX_FIRST
Date                          
2018-03-05     1.710     0.684
2018-03-06     1.725     0.690
2018-03-07     1.745     0.698
2018-03-08     1.750     0.700

is that column "Date", "PX_FIRST" and "PX_FIRST" are in different levels so I call df[['Date','PX_FIRST','PX_FIRST']] i get an error "...not in index"

Ideally, i d be looking to get

 Date          PX_FIRST  PX_LAST                              
2018-03-05     1.710     0.684
2018-03-06     1.725     0.690
2018-03-07     1.745     0.698
2018-03-08     1.750     0.700

All column names are on a similar level and have different names

Thanks

answered Apr 05 '18 at 07:28

SBad

1,245
5
23
36

1

You need `df = df.reset_index()`, then is possible use `df[['Date','PX_FIRST','PX_FIRST']]` – jezrael Apr 05 '18 at 07:32
1

because first column is index and `Date` is index name, not column name – jezrael Apr 05 '18 at 07:33
1

@jezrael Thanks a lot. because the structure of my data is unusual. I had to drop 2 levels and reindex the date to finally get the desired results. Thank you again – SBad Apr 05 '18 at 08:04

How to access a column within a column using pandas

3 Answers3