0

I have a data set with different levels and am trying to figure out a succinct way of pulling out the data from the lowest level for each entry. Here's some sample data with expected output. The blank cells are zeroes. This feels like it should be straightforward and I am running into problems. Any advice would be appreciated.

ID Lvl 1 Lvl 2 Lvl 3 Lvl 4 Desired Output
1 aaa 0 0 0 aaa
2 aaa bbb ccc 0 ccc
3 aaa bbb ccc 0 ccc
4 aaa bbb ccc ddd ddd

I tried building a function with conditionals starting at the lowest level and working backward, but that seems like overkill.

jpette
  • 3
  • 2
  • I think something like `df[['Lvl 1', 'Lvl 2', 'Lvl 3', 'Lvl 4']].ffill(axis=1).iloc[:, -1]` would work. – Nick ODell Jul 11 '23 at 17:08
  • See also: https://stackoverflow.com/questions/38152389/coalesce-values-from-2-columns-into-a-single-column-in-a-pandas-dataframe – Nick ODell Jul 11 '23 at 17:10

2 Answers2

0

You can create a custom function:

def last_filled_column(row):
    out = 0
    for value in row.values():
        if value != 0:
            out = value
    return out

And then you can just apply this function to your df like so:

df["output"] = df.apply(last_filled_column, axis=1)
Ivan Popov
  • 41
  • 3
0

I've created the following function:

import pandas as pd
import numpy as np

def extract_lowest_level(df):
    result_df = df.copy()
    result_df = df.replace(0, np.nan)
    result_df['Desired Output'] = result_df.apply(lambda row: row[row.last_valid_index()], axis=1)
    result_df = result_df.fillna(0)
    return result_df

That generates your desired input.

You can test it with the following code:

data = {
    'ID': [1, 2, 3, 4],
    'Lvl 1': ['aaa', 'aaa', 'aaa', 'aaa'],
    'Lvl 2': [0, 'bbb', 'bbb', 'bbb'],
    'Lvl 3': [0, 'ccc', 'ccc', 'ccc'],
    'Lvl 4': [0, 0, 0, 'ddd']
}

df = pd.DataFrame(data)
print(df)

result_df = extract_lowest_level(df)

print(result_df)
Jino Michel Aque
  • 513
  • 1
  • 4
  • 16