0

I am attempting to geocode by applying a function to pandas rows where value is NaNs.

import pandas as pd
import mapbox
MAPBOX_KEY="xxxx"
Geocoder = mapbox.Geocoder(access_token=MAPBOX_KEY)

df = pd.DataFrame({
                   'id': [1, 2, 3],
                   'Lat': [np.nan, 33.3210, 33.5231],
                   'Lon': [-112.2131, -111.3122, np.nan],
                   'address': ['addr1','addr2','addr3']
                 })


def geocode(address, field):

    res = Geocoder.forward(address)
    resjson = res.json()
    
    if field == "Lat":

        lat = resjson['features'][0]['geometry']['coordinates'][1]
    
        return lat
    
    if field == "Lng":
        
        lon = resjson['features'][0]['geometry']['coordinates'][0]
        return lon

# Apply function
df['Lat'] = df[df['Lat'].isnull()].apply(lambda x: geocode(x['address'], 'Lat'), axis=1)

Traceback:

f, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate, only_slice)
    668         # some axes don't allow reindexing with dups
    669         if not allow_dups:
--> 670             self.axes[axis]._validate_can_reindex(indexer)
    671 
    672         if axis >= self.ndim:

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in _validate_can_reindex(self, indexer)
   3783         # trying to reindex on an axis with duplicates
   3784         if not self._index_as_unique and len(indexer):
-> 3785             raise ValueError("cannot reindex from a duplicate axis")
   3786 
   3787     def reindex(

ValueError: cannot reindex from a duplicate axis
kms
  • 1,810
  • 1
  • 41
  • 92

1 Answers1

0

Firstly,

df['Lat'] = df[df['Lat'].isnull()].apply(lambda x: geocode(x['address'], 'Lat'), axis=1)

causes that in the 'Lat' column only null values are replaced by generated values, other values are replaced by nulls. However, it is not the case... I had tried to recover the issue with your data, but the error didn't occur.

According to this question valueerror-cannot-reindex-from-a-duplicate-axis, you can check if there are some duplicated indexes in the dataframe:

df[df.index.duplicated()]