I have read a wikipedia table into a dataframe:
https://es.wikipedia.org/wiki/Anexo:Municipios_de_la_Comunidad_de_Madrid
import pandas as pd
from unicodedata import normalize
df = pd.read_html('https://es.wikipedia.org/wiki/Anexo:Municipios_de_la_Comunidad_de_Madrid')
madrid = df[0]
madrid['Población(2017)'] = madrid['Población(2017)'].apply(lambda x:normalize('NFKD', x)).str.replace(' ','')
madrid['Población(2017)'] = pd.to_numeric(madrid['Población(2017)'])
I had to unicodedata.normalize
because the apparent spaces for formatting numbers such as 206 589 was actually a xa0
character
Now, I want to select from that dataframe a subset of cities whose populations adds a total number as close as possible to a given number. I would like to select which cities' populations, added together, would be just above 2,200,000 habitants
I tried variations of this without result
madrid[madrid['Población(2017)'].sum() > 2178000]
the error message:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: True
Could somebody figure out a condition that select what I want?
Thanks in advance