I am having problems to use the proper pandas function to drop rows in dataframe of duplicate value of a key inside a dict in one of its column lugar.
This is the data of the dataframe:
{'calculado': {73683: '2021-05-27 00:43:46',
83767: '2021-05-27 00:43:46',
103395: '2021-05-27 00:43:46',
105314: '2021-05-27 00:43:46',
116555: '2021-05-27 00:43:46',
120764: '2021-05-27 00:43:46',
120892: '2021-05-27 00:43:46',
122760: '2021-05-27 00:43:46',
124269: '2021-05-27 00:43:46',
125707: '2021-05-27 00:43:46'},
'geom': {73683: '17.649999999999995,-93.65',
83767: '15.55,-93.25',
103395: '11.45,-98.45',
105314: '11.049999999999997,-98.55',
116555: '8.75,-78.45',
120764: '7.849999999999997,-89.54999999999998',
120892: '7.849999999999997,-76.75',
122760: '7.449999999999998,-81.95',
124269: '7.149999999999999,-75.04999999999998',
125707: '6.849999999999998,-75.25'},
'lat': {73683: 17.649999999999995,
83767: 15.55,
103395: 11.45,
105314: 11.049999999999997,
116555: 8.75,
120764: 7.849999999999997,
120892: 7.849999999999997,
122760: 7.449999999999998,
124269: 7.149999999999999,
125707: 6.849999999999998},
'lon': {73683: -93.65,
83767: -93.25,
103395: -98.45,
105314: -98.55,
116555: -78.45,
120764: -89.54999999999998,
120892: -76.75,
122760: -81.95,
124269: -75.04999999999998,
125707: -75.25},
'lugar': {73683: {'distancia': 12.55,
'mensaje': '13 kms. de Huimanguillo, Tabasco, México',
'nombre': 'Huimanguillo, Tabasco, México',
'pais': 'mx'},
83767: {'distancia': 16.74,
'mensaje': '17 kms. de Pijijiapan, Chiapas, 30540, México',
'nombre': 'Pijijiapan, Chiapas, 30540, México',
'pais': 'mx'},
103395: 'Mar abierto',
105314: 'Mar abierto',
116555: {'distancia': 6.7,
'mensaje': '7 kms. de Río Congo Arriba, Distrito Santa Fe, Darién, Panamá',
'nombre': 'Río Congo Arriba, Distrito Santa Fe, Darién, Panamá',
'pais': 'pa'},
120764: 'Mar abierto',
120892: {'distancia': 5.83,
'mensaje': '6 kms. de Veraguas, Panamá',
'nombre': 'Veraguas, Panamá',
'pais': 'co'},
122760: {'distancia': 100.26,
'mensaje': '100 kms. de Veraguas, Panamá',
'nombre': 'Veraguas, Panamá',
'pais': 'pa'},
124269: {'distancia': 12.09,
'mensaje': '12 kms. de Anorí, Nordeste, Antioquia, Región Andina, 052857, Colombia',
'nombre': 'Anorí, Nordeste, Antioquia, Región Andina, 052857, Colombia',
'pais': 'co'},
125707: {'distancia': 4.03,
'mensaje': '4 kms. de Guadalupe, Norte, Antioquia, Región Andina, Colombia',
'nombre': 'Guadalupe, Norte, Antioquia, Región Andina, Colombia',
'pais': 'co'}},
'valor': {73683: 198,
83767: 198,
103395: 197,
105314: 198,
116555: 198,
120764: 198,
120892: 198,
122760: 198,
124269: 196,
125707: 198},
'variable': {73683: 'T',
83767: 'T',
103395: 'T',
105314: 'T',
116555: 'T',
120764: 'T',
120892: 'T',
122760: 'T',
124269: 'T',
125707: 'T'}}
As you can see, the lugar column has a dictionary and one of the keys is nombre in this case the value: Veraguas, Panamá is duplicated, I will like to drop duplicates rows of dataframe and keep one row only per name for the rows that has the dict and key in lugar column.
One approach I have tried is to create a new column with the value of the key and then run drop_duplicates but I am unable to get the value inside the column. but I am able to get it for the first row like this
df_asc['lugar'].iloc[0]['nombre']
-> Huimanguillo, Tabasco, México
Is there a way to do this without looping the df doing it manually? I am really new to Python and Pandas.
EDITED: Expected result I converted to csv to be able to delete in spreadsheet as I am unable to do it with pandas...