0

I'm trying to replace null values only in column 2 with dictionary values by matching values in column 1 with keys in dictionary.

Country Rate    Tax Rate
Algeria 2.00%   76.90%
Angola  5.00%    null
Algeria 2.00%    null
Angola  5.00%    null
Algeria 2.10%   76.90%
Angola  4.90%   52.10%
Algeria 2.20%   76.90%
  1. I calculated the mean Tax Rate buy Country.
  2. I dropped all countries that didn't have a mean calculation: all values null
  3. I created a dict from results {country, TR(mean)} = (k,v)
  4. I created code that replaces values in Tax Rate to matching country in dict

NOTE: it replaces all values in Tax Rate not just null values

tr = df.groupby('country')['tax rate'].mean()
tr.dropna(inplace=True)
tr_dict = tr.to_dict()
df['tax rate'] = df['country'].apply(lambda x: tr_dict.get(x))

results

Country Rate    Tax Rate
Algeria 2.00%   mean of Algeria
Angola  5.00%   mean of Angola
Algeria 2.00%   mean of Algeria 
Angola  5.00%   mean of Angola 
Algeria 2.10%   mean of Algeria
Angola  4.90%   mean of Angola
Algeria 2.20%   mean of Algeria

I believe I am missing something at the end of this code as it is correctly replacing the values by 'country' but it is replacing all 'tax rate' values and I only need it to replace null values only.

* Expected results*

Country Rate    Tax Rate
Algeria 2.00%   76.90%
Angola  5.00%    mean of Angola
Algeria 2.00%    mean of Algeria
Angola  5.00%    mean of Angola
Algeria 2.10%   76.90%
Angola  4.90%   52.10%
Algeria 2.20%   76.90%
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • I suggest you store your numbers as a numeric type and not strings, `df['Rate'] = pd.to_numeric(df['Rate'].str.strip('%'), errors='coerce')`, if you expect to do math. – ALollz May 01 '19 at 14:09

1 Answers1

1

Try with transform and using fillna

df['tax rate']=df['tax rate'].fillna(df.groupby('country')['tax rate'].transform('mean'))
BENY
  • 317,841
  • 20
  • 164
  • 234
  • The last issue: how to I prevent mean results from converting to a scientific number? or convert it back to a int from scientific number? – SmitDataDawg May 01 '19 at 14:15
  • @SmitDataDawg https://stackoverflow.com/questions/21137150/format-suppress-scientific-notation-from-python-pandas-aggregation-results – BENY May 01 '19 at 14:16
  • Thanks again, that was exactly what I was looking for as previous searches turned up fixes that were not that simple. Thanks for your help. – SmitDataDawg May 01 '19 at 15:14