I'm trying to break a DataFrame into four parts and to impute rounded mean values for each part using fillna()
. I have two columns, main_campus
and degree_type
I want to filter on, which have two unique values each. So between them I should be able to filter the DataFrame into two groups.
I first did this with a triple for loop (see below), which seems to work, but when I tried to do it in a more elegant way, I got a SettingWithCopy
warning that I couldn't fix by using .loc
or .copy()
, and the missing values wouldn't be filled even when inplace
was set to True
. Here's the code for the latter method:
# Imputing mean values for main campus BA students
df[(df.main_campus == 1) &
(df.degree_type == 'BA')] = df[(df.main_campus == 1) &
(df.degree_type == 'BA')].fillna(
df[(nulled_data.main_campus == 1) &
(df.degree_type == 'BA')
].mean(),
inplace=True)
# Imputing mean values for main campus BS students
df[(df.main_campus == 1) &
(df.degree_type == 'BS')] = df[(df.main_campus == 1) &
(df.degree_type == 'BS')].fillna(
df[(df.main_campus == 1) &
(df.degree_type == 'BS')
].mean(),
inplace=True)
# Imputing mean values for downtown campus BA students
df[(df.main_campus == 0) &
(df.degree_type == 'BA')] = df[(df.main_campus == 0) &
(df.degree_type == 'BA')].fillna(
df[(df.main_campus == 0) &
(df.degree_type == 'BA')
].mean(),
inplace=True)
# Imputing mean values for downtown campus BS students
df[(df.main_campus == 0) &
(df.degree_type == 'BS')] = df[(df.main_campus == 0) &
(df.degree_type == 'BS')].fillna(
df[(df.main_campus == 0) &
(df.degree_type == 'BS')
].mean(),
inplace=True)
I should mention the previous code went through several iterations, trying it without setting it back to the slice, with and without inplace
, etc.
Here's the code with the triple for loop that works:
imputation_cols = [# all the columns I want to impute]
for col in imputation_cols:
for i in [1, 0]:
for path in ['BA', 'BS']:
group = ndf.loc[((df.main_campus == i) &
(df.degree_type == path)), :]
group = group.fillna(value=round(group.mean()))
df.loc[((df.main_campus == i) &
(df.degree_type == path)), :] = group
It's worth mentioning that I think the use of the group
variable in the triple for loop code is also to help the filled NaN values actually get set back to the DataFrame, but I would need to double check this.
Does anyone have an idea for what's going on here?