69

I have a dataframe with Multiindex and would like to modify one particular level of the Multiindex. For instance, the first level might be strings and I may want to remove the white spaces from that index level:

df.index.levels[1] = [x.replace(' ', '') for x in df.index.levels[1]]

However, the code above results in an error:

TypeError: 'FrozenList' does not support mutable operations.

I know I can reset_index and modify the column and then re-create the Multiindex, but I wonder whether there is a more elegant way to modify one particular level of the Multiindex directly.

denfromufa
  • 5,610
  • 13
  • 81
  • 138

3 Answers3

64

Thanks to @cxrodgers's comment, I think the fastest way to do this is:

df.index = df.index.set_levels(df.index.levels[0].str.replace(' ', ''), level=0)

Old, longer answer:

I found that the list comprehension suggested by @Shovalt works but felt slow on my machine (using a dataframe with >10,000 rows).

Instead, I was able to use .set_levels method, which was quite a bit faster for me.

%timeit pd.MultiIndex.from_tuples([(x[0].replace(' ',''), x[1]) for x in df.index])
1 loop, best of 3: 394 ms per loop

%timeit df.index.set_levels(df.index.get_level_values(0).str.replace(' ',''), level=0)
10 loops, best of 3: 134 ms per loop

In actuality, I just needed to prepend some text. This was even faster with .set_levels:

%timeit pd.MultiIndex.from_tuples([('00'+x[0], x[1]) for x in df.index])
100 loops, best of 3: 5.18 ms per loop

%timeit df.index.set_levels('00'+df.index.get_level_values(0), level=0)
1000 loops, best of 3: 1.38 ms per loop

%timeit df.index.set_levels('00'+df.index.levels[0], level=0)
1000 loops, best of 3: 331 µs per loop

This solution is based on the answer in the link from the comment by @denfromufa ...

python - Multiindex and timezone - Frozen list error - Stack Overflow

John
  • 1,335
  • 12
  • 17
  • 1
    This seems faster and more elegant than constructing a new index. I would also add that in most cases you would just do `inplace=True`. – cxrodgers Feb 21 '18 at 23:08
  • 3
    Actually I think your code contains an error, it should be `df.index.levels[0]` wherever you have `df.index.get_level_values(0)`. This is also how they do it in the answer that you link – cxrodgers Feb 21 '18 at 23:22
  • Is `.get_level_values` not available for you? Which version of pandas are you using? I'm on v0.22.0 and both seem to give me the same result, but your recommendation using simply `.levels[0]` is much faster than `.get_level_values(0)`. I'll add this to my answer. – John Feb 23 '18 at 13:29
  • `get_level_values` does not do the same thing as `levels` .... I don't totally understand it but the first gives you the value of that level for every row, whereas `levels` only give you the distinct level values, or something like that. – cxrodgers Feb 23 '18 at 19:03
  • 1
    @John +1 but use `df.index.unqiue(level=0)` instead of `df.index.levels[0]` or `df.index.get_level_values(0)`. It is safer and made made for this case. Especially for `get_level_values` which can have conflicts on repeated level entries. – Little Bobby Tables Jun 15 '18 at 15:21
  • The comment by @LittleBobbyTables was ^exactly^ what I needed. In general if you're working with a multiindex, you really need to use the unique, otherwise the set_levels call will fail with a 'Level values must be unique' error – Brian Wylie Jan 03 '20 at 20:34
  • @John, @cxrodgers's comment is right, the solution with `df.index.get_level_values(0)` will fail as soon as the values are not unique. – Jacquot May 07 '20 at 13:13
37

As mentioned in the comments, indexes are immutable and must be remade when modifying, but you do not have to use reset_index for that, you can create a new multi-index directly:

df.index = pd.MultiIndex.from_tuples([(x[0], x[1].replace(' ', ''), x[2]) for x in df.index])

This example is for a 3-level index, where you want to modify the middle level. You need to change the size of the tuple for different level sizes.

Update

John's improvement is great performance-wise, but as mentioned in the comments it causes an error. So here's the corrected implementation with small improvements:

df.index.set_levels(
    df.index.levels[0].str.replace(' ',''), 
    level=0,
    inplace=True,  # If False, you will need to use `df.index = ...`
)

If you'd like to use level names instead of numbers, you'll need another small variation:

df.index.set_levels(
    df.index.levels[df.index.names.index('level_name')].str.replace(' ',''), 
    level='level_name',
    inplace=True,
)
Shovalt
  • 6,407
  • 2
  • 36
  • 51
6

The other answers are working fine. However, depending on the structure of the multi-index, it can be considerably faster to apply a map directly on the levels instead of constructing a new multi-index.

I use the following function to modify a particular index level. It works also on single-level indices.

def map_index_level(index, mapper, level=0):
    """
    Returns a new Index or MultiIndex, with the level values being mapped.
    """
    assert(isinstance(index, pd.Index))
    if isinstance(index, pd.MultiIndex):
        new_level = index.levels[level].map(mapper)
        new_index = index.set_levels(new_level, level=level)
    else:
        # Single level index.
        assert(level==0)
        new_index = index.map(mapper)
    return new_index

Usage:

df = pd.DataFrame([[1,2],[3,4]])
df.index = pd.MultiIndex.from_product([["a"],["i","ii"]])
df.columns = ["x","y"]

df.index = map_index_level(index=df.index, mapper=str.upper, level=1)
df.columns = map_index_level(index=df.columns, mapper={"x":"foo", "y":"bar"})

# Result:
#       foo  bar
# a I     1    2
#   II    3    4

Note: The above works only if mapper preserves the uniqueness of level values! In the above example, mapper = {"i": "new", "ii": "new"} will fail in set_index() with a ValueError: Level values must be unique. One could disable the integrity check by modifying above code to:

new_index = index.set_levels(new_level, level=level,
                             verify_integrity=False)

But better don't! See the docs of set_levels.

normanius
  • 8,629
  • 7
  • 53
  • 83