0

I have a pandas dataframe with a two-level column index. It's read in from a spreadsheet where the author used a lot of whitespace to accomplish things like alignment (for example, one column is called 'Tank #').

I've been able to remove the whitespace on the levels individually...

level0 = raw.columns.levels[0].str.replace('\s', '', regex=True)
level1 = raw.columns.levels[1].str.replace('\s', '', regex=True)
raw.columns.set_levels([level0, level1], inplace=True)

...but I'm curious if there is a way to do it without having to change each individual level one at a time.

I tried raw.columns.set_levels(raw.columns.str.replace('\s', '', regex=True) but got AttributeError: Can only use .str accessor with Index, not MultiIndex.

Here is a small sample subset of the data-- my best attempt at SO table formatting :D, followed by a picture where I've highlighted in yellow the indices as received.

Run Info Run Info Run Data Run Data
run # Tank # Step A conc. %
ph
0 6931 5 5.29 33.14
1 6932 1 5.28 33.13
2 6933 2 5.32 33.40
3 6934 3 5.19 32.98

sample data

Thanks for any insight!

Edit: adding to_dict()

df.to_dict()
Out[5]: 
{'Unnamed: 0': {0: nan, 1: 0.0, 2: 1.0, 3: 2.0, 4: 3.0, 5: 4.0},
 'Run Info': {0: 'run #',
  1: '6931',
  2: '6932',
  3: '6933',
  4: '6934',
  5: '6935'},
 'Run Info.1': {0: 'Tank                             #',
  1: '5',
  2: '1',
  3: '2',
  4: '3',
  5: '4'},
 'Run Data': {0: 'Step A\npH',
  1: '5.29',
  2: '5.28',
  3: '5.32',
  4: '5.19',
  5: '5.28'},
 'Run Data.1': {0: 'concentration',
  1: '33.14',
  2: '33.13',
  3: '33.4',
  4: '32.98',
  5: '32.7'}}
JJL
  • 168
  • 2
  • 8
  • This is where `df.to_dict()` would be a good choice to include your sample data. – Quang Hoang Feb 03 '21 at 18:03
  • @QuangHoang I don't understand your comment... – JJL Feb 03 '21 at 18:30
  • that refers to *my best attempt at SO table formatting :D*. you could include `df.to_dict()` instead of copy/paste the raw output. – Quang Hoang Feb 03 '21 at 18:42
  • Could you be more explicit? I don't see how it helps (but I've added it anyway). – JJL Feb 03 '21 at 21:03
  • 1
    I mean instead of include the markdown text table, which obviously has the wrong header/column structure, you could include the **output** of `df.to_dict()`. People can just copy and paste **your code** and regenerate the data with correct column structure. Also refer to [this guide](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) especially for the multiIndex section. Thanks for upvoting/accepting my answer btw. – Quang Hoang Feb 03 '21 at 21:07
  • Love the guide -- thanks -- that will be a lot of help in the future. – JJL Feb 03 '21 at 21:25

1 Answers1

3

How about rename:

import re

df.rename(columns=lambda x: re.sub('\s+', ' ', x.strip() ),inplace=True)

If you don't want to keep any of the spaces, you can just replace ' ' with ''.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • 1
    Nice! Very clever - TYVM. Every time I work with MultiIndexes I fall down rabbit holes and it takes me 3 or 4 hours to climb out :) – JJL Feb 03 '21 at 18:29