Populating a column based on values in another column - pandas

Question

After merging two data frames I have some gaps in my data frame that can be filled in based on neighboring columns (I have many more columns, and rows in the DF but I'm focusing on these three columns): Example DF:

Unique ID | Type | Location

A         1       Land
A         NaN     NaN       
B         2       sub
B         NaN     NaN      
C         3       Land
C         3       Land

Ultimately I want the three columns to be filled in:

Unique ID | Type | Location

A         1       Land
A         1       Land     
B         2       sub
B         2       sub    
C         3       Land
C         3       Land

I've tried:

df.loc[df.Type.isnull(), 'Type'] = df.loc[df.Type.isnull(), 'Unique ID'].map(df.loc[df.Type.notnull()].set_index('Unique ID')['Type'])

but it throws: InvalidIndexError: Reindexing only valid with uniquely valued Index objects

What am I missing here? - Thanks

Use fillna \ ffill. Check this post: https://stackoverflow.com/questions/27905295/how-to-replace-nans-by-preceding-values-in-pandas-dataframe — Mike67, Aug 06 '20 at 16:56

score 0 · Answer 1 · answered Aug 06 '20 at 16:54

0

The best solution is to probably just get rid of the NaN rows instead of overwriting them. Pandas has a simple command for that:

df.dropna()

Here's the documentation for it: pandas.DataFrame.dropna

answered Aug 06 '20 at 16:54

Tyler McGoffin

13
2

score 0 · Accepted Answer · answered Aug 06 '20 at 16:56

0

Your example indicates that you want to forward-fill. YOu can do it like this (complete code):

import pandas as pd
from io import StringIO

clientdata = '''ID        N       T
A         1       Land
A         NaN     NaN       
B         2       sub
B         NaN     NaN      
C         3       Land
C         3       Land'''

df = pd.read_csv(StringIO(clientdata), sep='\s+')

df["N"] = df["N"].fillna(method="ffill")
df["T"] = df["T"].fillna(method="ffill")
print(df)

answered Aug 06 '20 at 16:56

JarroVGIT

4,291
1
17
29

Probably my fault in the way the question was asked, but the above would be helpful assuming 2 things - 1) the DF for Unique ID is in alphabetical/numeric order and 2) All Unique ID's have a least one row with the info I want to populate. I suppose I can sort the data frame to deal with the first issue, but there are about 15 (out of 85) Unique ID's that don't have the data I need, and a ffill would put erroneous data in the ones with out at least one input for Type and Location. Dropping the blank ones isn't really preferred either. – HennaKT Aug 06 '20 at 17:29
1

This is what solved my issue. Thanks for pointing me in the direction https://stackoverflow.com/questions/48816457/pandas-ffill-based-on-condition-in-another-column – HennaKT Aug 07 '20 at 17:39

Populating a column based on values in another column - pandas

2 Answers2