1

I want to fill missing value base on other columns in pandas. Here is my table:

Gender     Married
Male       Yes
Male       Yes
Female     No
Female     No
Male       NaN
Female     NaN

I to fill missing value of Married field by if Gender is Male -> Married is Yes, else Married is No:

df['Married'].fillna(df[df['Married'].isnull()].apply(lambda x: 'Yes' if (df[df['Married'].isnull()]['Gender'] is 'Male') else 'No', axis=1), inplace=True)

But it was fail, I try a lot of way and I get nothing as my expectation. I hope receive from all of you.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Huy Truong
  • 25
  • 1
  • 6

1 Answers1

2

I believe you need map with dictionary only in filtered rows:

mask = df['Married'].isnull()
df.loc[mask, 'Married'] = df.loc[mask, 'Gender'].map({'Male':'Yes', 'Female':'No'})
print (df)
   Gender Married
0    Male     Yes
1    Male     Yes
2  Female      No
3  Female      No
4    Male     Yes
5  Female      No

Another solution with numpy.where:

mask = df['Married'].isnull()
df.loc[mask, 'Married'] = np.where(df.loc[mask, 'Gender']  == 'Male', 'Yes','No')
print (df)
   Gender Married
0    Male     Yes
1    Male     Yes
2  Female      No
3  Female      No
4    Male     Yes
5  Female      No

Another solution with fillna of mapped Series:

df['Married'] = df['Married'].fillna(df['Gender'].map({'Male':'Yes', 'Female':'No'}))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hi @jezrael, In the table with a lot of value, Gender Male also has No in Married but mode(Gender=Male) is Yes, so I want to fill NaN for Male is Yes and I don't want to modify data has value already. My explain is clear for you? – Huy Truong Oct 17 '18 at 11:47
  • @HuyTruong - Sure, all 3 new solutions do it. – jezrael Oct 17 '18 at 11:49
  • thank you so much, it work for me, I save a lot of time. – Huy Truong Oct 17 '18 at 11:51
  • 1
    I have done it. Once again, thank you – Huy Truong Oct 17 '18 at 12:00
  • Hi @jezrael, I have once more question, can you help me? – Huy Truong Oct 17 '18 at 12:26
  • @HuyTruong - Sure, no problem. – jezrael Oct 17 '18 at 12:26
  • I want to statistics value of one field base on some other field. Ex: Field I want to collect: Dependent. Three fields index=[Gender, Married, Education] – Huy Truong Oct 17 '18 at 12:30
  • @HuyTruong - Sorry, not understand. – jezrael Oct 17 '18 at 12:30
  • Gender Married Education Dependent Male No Graduated 0 Female Yes Not Gra 1 Female No Graduated 0 Male No Graduated 1 Male No Graduated NaN I want to collect Dependent value with 3 field remain. Sorry I don't know how to create table – Huy Truong Oct 17 '18 at 12:36
  • @HuyTruong - Is possible edit question - sample data with expected output? Because bad format of data in comemnts. – jezrael Oct 17 '18 at 12:41
  • Hi @jezrael, here is [link data](https://imgur.com/1wbBOR1). I want to statistics Dependent value like that – Huy Truong Oct 17 '18 at 12:47
  • Hi @jezrael, this is my new image for clearly [data](https://imgur.com/Jjn5lGG). – Huy Truong Oct 17 '18 at 12:54
  • I am not statistics man, what is x value in last column? – jezrael Oct 17 '18 at 12:57
  • I have uploaded new image, example: Male -> No -> Graduated -> Dependent: 1, Male -> Yes-> Graduated -> Dependent: 0, Male -> No -> NOt Gra -> Dependent: 0 etc .. – Huy Truong Oct 17 '18 at 13:03
  • Sorry, simple to say that I want to sum Dependent value and create table like image. One more thing is count most Dependent value appear by mode() function and create the same table. My English is not good, if it not clear I will try more – Huy Truong Oct 17 '18 at 13:14