Alternative to reset_index().apply() to create new column based off index values

Question

I have a df with a multiindex with 2 levels. One of these levels, age, is used to generate another column, Numeric Age.

Currently, my idea is to reset_index, use apply with age_func which reads row["age"], and then re-set the index, something like...

df = df.reset_index("age")
df["Numeric Age"] = df.apply(age_func, axis=1)
df = df.set_index("age") # ValueError: cannot reindex from a duplicate axis

This strikes me as a bad idea. I'm having a hard time resetting the indices correctly, and I think this is probably a slow way to go.

What is the correct way to make a new column based on the values of one of your indices? Or, if this is the correct way to go, is there a way to re-set the indices such that the df is the exact same as when I started, with the new column added?

Could you provide some sample data and sample of what `age_func` is? It's difficult to determine what the goal is and/or why this code is not working without something to test. — Henry Ecker, Sep 06 '22 at 00:58
Duplicated column name, plz check this [post](https://stackoverflow.com/questions/60270081/valueerror-cannot-reindex-from-a-duplicate-axis-in-groupby-pandas) — Baron Legendre, Sep 06 '22 at 03:14

score 0 · Answer 1 · answered Sep 06 '22 at 21:10

We can set a new column using .loc, and modify the rows we need using masks. To use the correct col values, we also use a mask.

First step is to make a mask for the rows to target.

mask_foo = df.index.get_level_values("age") == "foo"

Later we will use .apply(axis=1), so write a function to handle the rows you will have from mask_foo.

def calc_foo_numeric_age(row):
    # The logic here isn't important, the point is we have access to the row
    return row["some_other_column"].split(" ")[0]

And now the .loc magic

df[mask_foo, "Numeric Age"] = df[mask_foo].apply(calc_foo_numeric_age, axis=1)

Repeat process for other target indices.

If your situation allows you to reset_index().apply(axis=1), I recommend that over this. I am doing this because I have other reasons for not wanting to reset_index().

Alternative to reset_index().apply() to create new column based off index values

1 Answers1