For a dataset of 200M GPS (lon, lat) coordinates of vessels I want to calculate an approximate distance to the nearest land or coastline, as a function called distance_to_shore, that will return the distance and country of that shore.
I'm using a shape file of country boundaries and coastlines from: http://www.naturalearthdata.com/
Some considerations are that the Oceanic pole of inaccessibility is 2688 km. So this would be the maximum possible distance from shore, this could be used to create some kind of bounding box. I want to calculate accounting for the Earth's curvature (not Euclidean), e.g. Haversine, or Vincenty method.
For this I started looking at scipy.spatial.cKDTree, but this does not allow for Haversine distance metric. On the other hand the sklearn.neighbors.BallTree, does allows for Haversine distance metric but I can't get it to work. Here is the code I have so far. N.B. the function should ideally be vectorized.
############################### SOLUTION ###############################
Thanks for all the input this is how I solved it in Python, including functions to download relevant shape files, needs some cleaning
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import shapely as sp
import cartopy.io.shapereader as shpreader
import ssl
import urllib.request
import zipfile
from shutil import rmtree
from dbfread import DBF
from scipy import spatial
from sklearn.neighbors import NearestNeighbors, BallTree
from pyproj import Proj, transform
from math import *
coastline = np.load(os.path.join(os.path.dirname(__file__),
'../data/shape_files/coast_coords_10m.npy'))
ports = np.load(os.path.join(os.path.dirname(__file__),
'../data/shape_files/ports_coords.npy'))
def extract_geom_meta(country):
'''
extract from each geometry the name of the country
and the geom_point data. The output will be a list
of tuples and the country name as the last element.
'''
geoms = country.geometry
coords = np.empty(shape=[0, 2])
for geom in geoms:
coords = np.append(coords, geom.exterior.coords, axis = 0)
country_name = country.attributes["ADMIN"]
return [coords, country_name]
def save_coastline_shape_file():
'''
store shp files locally, this functions will download
shapefiles for the whole planet.
'''
ne_earth = shpreader.natural_earth(resolution = '10m',
category = 'cultural',
name='admin_0_countries')
reader = shpreader.Reader(ne_earth)
countries = reader.records()
# extract and create separate objects
world_geoms = [extract_geom_meta(country) for country in countries]
coords_countries = np.vstack([[np.array(x[:-1]), x[-1]]
for x in world_geoms])
coastline = np.save(os.path.join(os.path.dirname(__file__),
'../data/shape_files/coast_coords_10m.npy')
, coords_countries)
print('Saving coordinates (...)')
def distance_to_shore(lon, lat):
'''
This function will create a numpy array of distances
to shore. It will contain and ID for AIS points and
the distance to the nearest coastline point.
'''
coastline_coords = np.vstack([np.flip(x[0][0], axis=1) for x in coastline])
countries = np.hstack([np.repeat(str(x[1]), len(x[0][0])) for x in coastline])
tree = BallTree(np.radians(coastline_coords), metric='haversine')
coords = pd.concat([np.radians(lat), np.radians(lon)], axis=1)
dist, ind = tree.query(coords, k=1)
df_distance_to_shore = pd.Series(dist.flatten()*6371, name='distance_to_shore')
df_countries = pd.Series(countries[ind].flatten(), name='shore_country')
return pd.concat([df_distance_to_shore, df_countries], axis=1)