Apply forward to multiple columns in dataframe with the same name

Question

I have a dataframe that has columns with a repeated name. I have this code that does a forward-fill for blank values. However, it only does it on the first column found with that name.

Example dataframe:

import pandas as pd

data_dict = {'name1': [1.0, '', 2.0, '', 5.0], 'name2': [10.0, '', 14.0, 18.0, ''], 'name3': ['some string2', 'some string3', 'some string4', 'some string5', 'some string6'], 'name4': ['description2', 'description3', 'description4', 'description5', 'description6'], 'name2.1': [36.0, '', '', 44.0, ''], 'name6': ['more text2', 'more text3', 'more text4', 'more text5', 'more text6']}

df = pd.DataFrame.from_dict(data_dict)
df

Dataframe:

name1   name2   name3          name4           name2    name6
1       10      some string2    description2    36      more text2
                some string3    description3            more text3
2       14      some string4    description4            more text4
        18      some string5    description5    44      more text5
5               some string6    description6            more text6

Here is my code:

backfill_header_list = [
    'Name1',
    'Name2'
]    
for i in backfill_header_list:
    df.loc[:, i] = df.loc[:, i].fillna(method='ffill')

For this example, Name2 is the repeated column name.

Desired output:

name1   name2   name3          name4           name2    name6
1       10      some string2    description2    36      more text2
1       10      some string3    description3    36      more text3
2       14      some string4    description4    36      more text4
2       18      some string5    description5    44      more text5
5       18      some string6    description6    44      more text6

Is there an efficient way to have pandas iterate through all columns that match that name?

Could you provide an example of `df`, so that we don't have to make one up ourselves? See [How to make good reproducible pandas examples](/q/20109391/4518341). See also [mre]. — wjandrea, Aug 27 '22 at 22:28
@MikeMann Please post the output of `df.to_dict()` or something like that. The blank cells here make it impossible to convert to df directly. — wjandrea, Aug 27 '22 at 22:40
@MikeMann It's even possible that you have null strings where you thought you had missing values — wjandrea, Aug 27 '22 at 22:41
Well, you CAN'T do `to_dict`, because a dict can't have duplicate keys. You can do `df['name2']` to return both of the columns, but in the end duplicate column names are just going to hurt you. — Tim Roberts, Aug 27 '22 at 22:57
@TimRoberts that's right. when I import the dataframe from a raw excel file I have the duplicate names, but in the dict it wrote it to 2.1. My end goal is to fill in the blanks and then proceed with slicing up the raw dataframe to make it a manageable dataset. — Mike Mann, Aug 27 '22 at 22:58
`df.loc[:,backfill_header_list].fillna(method='ffill')` will give you your desired result, but if you want to assign it back to your dataframe, it will show an error `ValueError: Setting with non-unique columns is not allowed.` — , Aug 27 '22 at 23:26

score 1 · Answer 1 · 2022-08-27T23:38:22.003

If you already have the list of names to be matched, just referencing will be enough.

data = {'a':[1,np.nan,2,np.nan,5],'b':[10,np.nan,14,18,np.nan],'c':['somestrings']*5,'d':['descriptions']*5,'e':[36,np.nan,np.nan,44,np.nan], 'f':['more texts']*5}
df= pd.DataFrame(data)
df.columns = ['name1','name2','name3','name4','name2','name6']

df.loc[:,backfill_header_list].fillna(method='ffill')
 
   name1  name2  name2
0    1.0   10.0   36.0
1    1.0   10.0   36.0
2    2.0   14.0   36.0
3    2.0   18.0   44.0
4    5.0   18.0   44.0

However, if you want to assign it back to your original dataframe, it can't be done due to the same multiple column names.

df.loc[:,backfill_header_list] = df.loc[:,backfill_header_list].fillna(method='ffill')

ValueError: Setting with non-unique columns is not allowed.

So, what you can do is using df.apply.

df[backfill_header_list] = df[backfill_header_list].apply(lambda x : x.fillna(method='ffill'))

   name1  name2        name3         name4  name2       name6
0    1.0   10.0  somestrings  descriptions   36.0  more texts
1    1.0   10.0  somestrings  descriptions   36.0  more texts
2    2.0   14.0  somestrings  descriptions   36.0  more texts
3    2.0   18.0  somestrings  descriptions   44.0  more texts
4    5.0   18.0  somestrings  descriptions   44.0  more texts

However, it's better not to have the same column names in your dataframe because it will certainly cause you many problems later on.

score 0 · Accepted Answer · answered Aug 27 '22 at 23:49

In case anyone else comes across this problem, here's what I came up with.

I essentially created a function to compare the headers to my backfill_header_list and create a list of indexes to loop through in my backfill for loop.

headers = list(df.columns)

backfill_header_list = [
    'Name1',
    'Name2'
backfill_index_list = []
    for i, col in enumerate(headers):
        if col in backfill_header_list:
            backfill_index_list.append(i)

for i in backfill_index_list:
    df.iloc[:, i] = df.iloc[:, i].fillna(method='ffill')

It can still work since it now involves in column index, not the name itself :) — , Aug 27 '22 at 23:51

score 0 · Answer 3 · answered Aug 28 '22 at 00:04

I think you can use this code below it simialr to @Kevin answer

df.loc[:,df.columns].fillna(method='ffill')

if you want to modify last column of you duplicated column you can use

df.loc[:,~df.T.duplicated(keep='last')]..fillna(method='ffill')

but as all comments
Duplicate column names are a problem if you plan to transfer your data set to another statistical language. They're also a problem because it will cause unanticipated and sometimes difficult to debug problems in Python.

Apply forward to multiple columns in dataframe with the same name

3 Answers3