I'm doing some real estate data cleaning and encountered this novice problem which surprisingly seems I can't resolve by my own.
I have this dataframe which has nan values in the lat and lon column. I can figure the almost correct values inputing the mean of lat and lon for the given neighborhood.
This one is an example, the actual DF has more than 20k rows.
lat lon neighborhood
-34.62 -58.50 Monte Castro
-34.63 -58.36 Boca
nan nan San Telmo
I made two dictionaries with lat and lon means for each neighborhood with the following code:
neighborhood_lat = []
neighborhood_lon = []
for neighborhood in df['l3'].unique():
lat = df[((df['l3']==neighborhood) & (df['lat'].notnull()))].mean().lat
lon = df[((df['l3']==neighborhood) & (df['lon'].notnull()))].mean().lon
neighborhood_lat.append({neighborhood: lat})
neighborhood_lon.append({neighborhood: lon})
This is part of one of those dict:
neighborhood_lat
[{'Mataderos': -34.65278757721805},
{'Saavedra': -34.551813882357166},
{nan: nan},
{'Boca': -34.63204552441155},
{'Boedo': -34.62695442446412},
{'Abasto': -34.603728937455315},
{'Flores': -34.62757516061659},
{'Nuñez': -34.54843158034983},
{'Retiro': -34.595564030955934},
{'Almagro': -34.60692879236826},
{'Palermo': -34.58274909271148},
{'Belgrano': -34.56304387233704},
{'Recoleta': -34.592081482406854},
{'Balvanera': -34.608665174550694},
{'Caballito': -34.61749059613885}
Then I'm trying to fillna lat and lon with those dictionaries but I can't understand how to assing a condition for the fillna so it fills lat and lon according to the neighborhood lat and lon mean.
Expected results
lat lon neighborhood
-34.62 -58.50 Monte Castro
-34.63 -58.36 Boca
(mean lat of neighborhood) (mean lon of neighborhood) San Telmo
Thanks for your help.