1

I have a df with a multiindex with 2 levels. One of these levels, age, is used to generate another column, Numeric Age.

Currently, my idea is to reset_index, use apply with age_func which reads row["age"], and then re-set the index, something like...

df = df.reset_index("age")
df["Numeric Age"] = df.apply(age_func, axis=1)
df = df.set_index("age") # ValueError: cannot reindex from a duplicate axis

This strikes me as a bad idea. I'm having a hard time resetting the indices correctly, and I think this is probably a slow way to go.

What is the correct way to make a new column based on the values of one of your indices? Or, if this is the correct way to go, is there a way to re-set the indices such that the df is the exact same as when I started, with the new column added?

David Jay Brady
  • 1,034
  • 8
  • 20
  • Could you provide some sample data and sample of what `age_func` is? It's difficult to determine what the goal is and/or why this code is not working without something to test. – Henry Ecker Sep 06 '22 at 00:58
  • Duplicated column name, plz check this [post](https://stackoverflow.com/questions/60270081/valueerror-cannot-reindex-from-a-duplicate-axis-in-groupby-pandas) – Baron Legendre Sep 06 '22 at 03:14

1 Answers1

0

We can set a new column using .loc, and modify the rows we need using masks. To use the correct col values, we also use a mask.

First step is to make a mask for the rows to target.

mask_foo = df.index.get_level_values("age") == "foo"

Later we will use .apply(axis=1), so write a function to handle the rows you will have from mask_foo.

def calc_foo_numeric_age(row):
    # The logic here isn't important, the point is we have access to the row
    return row["some_other_column"].split(" ")[0]

And now the .loc magic

df[mask_foo, "Numeric Age"] = df[mask_foo].apply(calc_foo_numeric_age, axis=1)

Repeat process for other target indices.

If your situation allows you to reset_index().apply(axis=1), I recommend that over this. I am doing this because I have other reasons for not wanting to reset_index().

David Jay Brady
  • 1,034
  • 8
  • 20