How to perform string operation on an entire Pandas MultiIndex?

Question

I have a pandas dataframe with a two-level column index. It's read in from a spreadsheet where the author used a lot of whitespace to accomplish things like alignment (for example, one column is called 'Tank #').

I've been able to remove the whitespace on the levels individually...

level0 = raw.columns.levels[0].str.replace('\s', '', regex=True)
level1 = raw.columns.levels[1].str.replace('\s', '', regex=True)
raw.columns.set_levels([level0, level1], inplace=True)

...but I'm curious if there is a way to do it without having to change each individual level one at a time.

I tried raw.columns.set_levels(raw.columns.str.replace('\s', '', regex=True) but got AttributeError: Can only use .str accessor with Index, not MultiIndex.

Here is a small sample subset of the data-- my best attempt at SO table formatting :D, followed by a picture where I've highlighted in yellow the indices as received.

	Run Info	Run Info	Run Data	Run Data
	run #	Tank #	Step A	conc. %
			ph
0	6931	5	5.29	33.14
1	6932	1	5.28	33.13
2	6933	2	5.32	33.40
3	6934	3	5.19	32.98

Thanks for any insight!

Edit: adding to_dict()

df.to_dict()
Out[5]: 
{'Unnamed: 0': {0: nan, 1: 0.0, 2: 1.0, 3: 2.0, 4: 3.0, 5: 4.0},
 'Run Info': {0: 'run #',
  1: '6931',
  2: '6932',
  3: '6933',
  4: '6934',
  5: '6935'},
 'Run Info.1': {0: 'Tank                             #',
  1: '5',
  2: '1',
  3: '2',
  4: '3',
  5: '4'},
 'Run Data': {0: 'Step A\npH',
  1: '5.29',
  2: '5.28',
  3: '5.32',
  4: '5.19',
  5: '5.28'},
 'Run Data.1': {0: 'concentration',
  1: '33.14',
  2: '33.13',
  3: '33.4',
  4: '32.98',
  5: '32.7'}}

This is where `df.to_dict()` would be a good choice to include your sample data. — Quang Hoang, Feb 03 '21 at 18:03
that refers to *my best attempt at SO table formatting :D*. you could include `df.to_dict()` instead of copy/paste the raw output. — Quang Hoang, Feb 03 '21 at 18:42
Could you be more explicit? I don't see how it helps (but I've added it anyway). — JJL, Feb 03 '21 at 21:03
I mean instead of include the markdown text table, which obviously has the wrong header/column structure, you could include the **output** of `df.to_dict()`. People can just copy and paste **your code** and regenerate the data with correct column structure. Also refer to [this guide](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) especially for the multiIndex section. Thanks for upvoting/accepting my answer btw. — Quang Hoang, Feb 03 '21 at 21:07
Love the guide -- thanks -- that will be a lot of help in the future. — JJL, Feb 03 '21 at 21:25

score 3 · Accepted Answer · answered Feb 03 '21 at 18:09

3

How about rename:

import re

df.rename(columns=lambda x: re.sub('\s+', ' ', x.strip() ),inplace=True)

If you don't want to keep any of the spaces, you can just replace ' ' with ''.

answered Feb 03 '21 at 18:09

Quang Hoang

146,074
10
56
74

1

Nice! Very clever - TYVM. Every time I work with MultiIndexes I fall down rabbit holes and it takes me 3 or 4 hours to climb out :) – JJL Feb 03 '21 at 18:29

How to perform string operation on an entire Pandas MultiIndex?

1 Answers1