Pandas: How to Find Data from Last Filled-in Column

Question

I have a data set with different levels and am trying to figure out a succinct way of pulling out the data from the lowest level for each entry. Here's some sample data with expected output. The blank cells are zeroes. This feels like it should be straightforward and I am running into problems. Any advice would be appreciated.

ID	Lvl 1	Lvl 2	Lvl 3	Lvl 4	Desired Output
1	aaa	0	0	0	aaa
2	aaa	bbb	ccc	0	ccc
3	aaa	bbb	ccc	0	ccc
4	aaa	bbb	ccc	ddd	ddd

I tried building a function with conditionals starting at the lowest level and working backward, but that seems like overkill.

I think something like `df[['Lvl 1', 'Lvl 2', 'Lvl 3', 'Lvl 4']].ffill(axis=1).iloc[:, -1]` would work. — Nick ODell, Jul 11 '23 at 17:08
See also: https://stackoverflow.com/questions/38152389/coalesce-values-from-2-columns-into-a-single-column-in-a-pandas-dataframe — Nick ODell, Jul 11 '23 at 17:10

score 0 · Answer 1 · answered Jul 11 '23 at 17:14

0

You can create a custom function:

def last_filled_column(row):
    out = 0
    for value in row.values():
        if value != 0:
            out = value
    return out

And then you can just apply this function to your df like so:

df["output"] = df.apply(last_filled_column, axis=1)

answered Jul 11 '23 at 17:14

Ivan Popov

41
3

Thank you - this threw a "'numpy.ndarray' object is not callable" error. Not sure why. – jpette Jul 11 '23 at 18:01

score 0 · Accepted Answer · answered Jul 11 '23 at 17:16

I've created the following function:

import pandas as pd
import numpy as np

def extract_lowest_level(df):
    result_df = df.copy()
    result_df = df.replace(0, np.nan)
    result_df['Desired Output'] = result_df.apply(lambda row: row[row.last_valid_index()], axis=1)
    result_df = result_df.fillna(0)
    return result_df

That generates your desired input.

You can test it with the following code:

data = {
    'ID': [1, 2, 3, 4],
    'Lvl 1': ['aaa', 'aaa', 'aaa', 'aaa'],
    'Lvl 2': [0, 'bbb', 'bbb', 'bbb'],
    'Lvl 3': [0, 'ccc', 'ccc', 'ccc'],
    'Lvl 4': [0, 0, 0, 'ddd']
}

df = pd.DataFrame(data)
print(df)

result_df = extract_lowest_level(df)

print(result_df)

Pandas: How to Find Data from Last Filled-in Column

2 Answers2