-1

So, I'm working on a mapping application for geotagged images and I want include address information for the points of interest on my map. I have managed to successfully complete most of the task using Geopandas, GeoPy, and Nominatim with point data from a PostGIS table (e.g. POINT Z (8.726176366993529 50.10868874301912 96.90000000000001).

While the script does most of what I want, the result returns a lot of extraneous information and I'd like to parse it to just one or two pieces of data before updating my database. I was able to hack together my script using two articles on gecoding and reverse geocoding. My issue comes down to not being sure how the script receives the response object and how I can access the properties either before or after they're added to my Dataframe.

My code without import statements is as follows:

conn = psycopg2.connect(
    host="localhost",
    database="Nizz0k",
    user="Nizz0k",
    password="")
sql = "select * from public.\"Peng\""
engine = create_engine('postgresql://Nizz0k@localhost:5432/public.\"Peng\"')
df = gpd.read_postgis(sql, conn, geom_col="geom")
df['lon'] = df.geometry.apply(lambda p: p.x)
df['lat'] = df.geometry.apply(lambda p: p.y)
df['geocode'] = df['lat'].map(str) + ', ' + df['lon'].map(str)
locator = Nominatim(user_agent="pengMappingAgent", timeout=10)
rgeocode = RateLimiter(locator.reverse, min_delay_seconds=0.001)
tqdm.pandas()
df['address'] = df['geocode'].progress_apply(rgeocode)

So, my Python knowledge is very limited, but nothing I've tried to access the properties in the newly created df['address'] column seems to be working. Calling df.head() shows the correctly created column and address information, but now I want to simplify the info in the column and extract parts of it to new columns. Ideally, I'd like to get the street and house number information and neighborhood information pulled out, and get rid of the city, county, state, and country information as it's redundant.

Based on the research I've done, I should be able to pull this information out of the response object, but I'm not sure where or how to access it. It seems that this info gets converted to a string in my column (I think), and if not, I'm not sure how to set up a loop or lambda function to get this stuff out. Worst case, I assume just some string manipulation might achieve my goal, but it seems like there should be an easier way.

nizz0k
  • 471
  • 1
  • 8
  • 23
  • 1) Are you using [OSMPythonTools](https://github.com/mocnik-science/osm-python-tools), or GeoPy, or another method to do the query? 2) Can you [include the input data in your question?](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) We don't have access to your SQL server. – Nick ODell Jan 04 '22 at 17:46
  • So, it's just GeoPy and I provided an example of the data from the table but it's valid geometry with x,y,z in WGS84. – nizz0k Jan 04 '22 at 18:21

1 Answers1

1
  • clearly I can't connect to your database, hence simulated a GeoDataFrame that is a series of points
  • then simplified your code for calling Nominatim
  • raw returns a dict this can be extracted as per code below
import geopandas as gpd
import shapely.geometry
from geopy.geocoders import Nominatim
import pandas as pd

gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))

# a geodata frame with a few points...
df = gpd.GeoDataFrame(
    geometry=gdf.loc[gdf["iso_a3"].eq("BEL"), "geometry"]
    .apply(lambda p: p.exterior.coords)
    .explode()
    .apply(shapely.geometry.Point),
    crs="EPSG:4326",
).reset_index(drop=True)

locator = Nominatim(user_agent="pengMappingAgent", timeout=10)

df = df.join(df["geometry"].apply(lambda p: locator.reverse(f"{p.y}, {p.x}").raw["address"]).apply(pd.Series))

print(df.head(3).to_markdown(index=False))
df

output

geometry road suburb city county state postcode country country_code house_number village hamlet town region locality municipality isolated_dwelling neighbourhood tourism
POINT (6.15665815595878 50.80372101501058) A 4 Verlautenheide Aachen Städteregion Aachen Nordrhein-Westfalen 52080 Deutschland de nan nan nan nan nan nan nan nan nan nan
POINT (6.043073357781111 50.12805166279423) Beieknapp nan nan Canton Clervaux nan 9962 Lëtzebuerg lu 14 Holler nan nan nan nan nan nan nan nan
POINT (5.782417433300907 50.09032786722122) nan nan nan Bastogne Luxembourg 6600 België / Belgique / Belgien be nan Noville Neufmoulin Bastogne Wallonie nan nan nan nan nan
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • Hi Rob, thanks for your answer, but maybe I wasn't clear in my question. I have successfully performed the reverse geocoding. I get the results I expect, but my question is HOW to further parse and refine the returned address information. I just want the street+house number and neighborhood information parsed out of what is turned. – nizz0k Jan 04 '22 at 18:48
  • Ah, sorry, I skimmed your answer previously. So, you're saying that the information returned is in a string, but I know the information I want is returned as an object in single examples. I assume in this case, it's the code itself that is converting the information to a string and adding it to the dataframe. So, I need to either work with my comma separated string, or to re-write this as some sort of a loop that extracts the information I want. – nizz0k Jan 04 '22 at 18:55
  • 1
    I've updated - investigated the the **raw** property, it is a dict. To ask a question that gets what you want, provide a MWE of what you've tried that does not depend on infrastructure that cannot be access by an SO responder – Rob Raymond Jan 04 '22 at 19:19
  • So, this is what I'm realizing: I need to write this as a script to pull the data from the location.raw object and loop it. I can do what I want in a single case, it's the loop that's tripping me up – nizz0k Jan 04 '22 at 19:50
  • I've done it in `apply()`. Your code segment makes no sense, what is `tqdm` and `progress_apply()` is not a pandas / geopandas method... plus where do you do batch call to reverse geocode ?. the whole point of `apply(pd.Series)` is it expands out a dict to columns in this context – Rob Raymond Jan 04 '22 at 20:06
  • So, progress_apply() is a tqdm method, which just shows a progress bar when the script is running. Considering there's 1500 points to geocode it makes sense to use here. My code works for all the points, it just returns the results in a messy format. When you run this on a single coordinate pair, Nominatim returns a json object (see the reverse geocoding article I link) with full address information. I want to extract one or two bits of info from the object. That's it. The issue with my script is that the last line seems to process the object to a string in creating the df['address'] column. – nizz0k Jan 04 '22 at 20:26