-2

I am looking for bounding box data for individual states which exist within countries. For eg, for India - states like Andhra Pradesh, Karnataka etc or for Norway - bbox for Akershus, Aust-Agder etc.

One approach I know is download extracts from Geofabrik and use osmium-tool to get the bounding box data from that using this command -

osmium fileinfo -e -g data.bbox victoria-latest.osm.pbf

However, Geofabrik does not have the states for many countries like India, China, Indonesia, Austria, etc.

One alternative to that is to use extracts from download.openstreetmap.fr. Their extracts unfortunately have very incorrect bounding box values. For eg, the bounding box for Victoria state in Australia covers half of Australia. I've noticed this problem many of their extracts. Bhutan's bounding box covers 3x the lateral area. Malaysia's extract covers half of Indian ocean. This makes their extracts unusable to extract the bounding box data from.

Apart from this approach, there are also data sets like this - https://gist.github.com/graydon/11198540

However, these do not contain states and regions as you can see.

Where can I find this data? or, if possible, can I extract it from OSM in some way using Overpass? I am looking for any way or method to get this data. Even a comment pointing me in the right direction will help. Thanks.

vepzfe
  • 4,217
  • 5
  • 26
  • 46
  • 2
    as we said over on gis.se (https://gis.stackexchange.com/questions/413305/seeking-database-containing-bounding-box-data-for-all-regions-in-world) you could use the Natural Earth data for this or you could ask on http://opendata.stackexchange.com – Ian Turton Oct 07 '21 at 13:09
  • Hello @IanTurton, thanks for your comment. Replying to your original comment - I am not specifically looking for open data. I am looking for any source of this data. One way which I already know is to download extracts from Geofabrik and use osmium-tool to extract the bounding box data. However, Geofabric extracts do not have regions for many countries like India, Indonesia, China etc. OSM-FR extracts unfortunately have incorrect bounding box data. In conclusion, this is not a question specifically about open data sources, but any way to find the bounding box data for different regions. – vepzfe Oct 07 '21 at 13:27
  • Please just download the NaturalEarth data - https://www.naturalearthdata.com/downloads/110m-cultural-vectors/ – Ian Turton Oct 07 '21 at 15:39
  • Thank you. I started looking into it after your initial comment. I'm not familiar with shp files so I'm trying to figure out how to extract the needed data from it. – vepzfe Oct 07 '21 at 16:59
  • @IanTurton, I looked at all the files from the natural earth downloads in QGIS and apart from USA, data on the states/provinces for no other country exists in those files. Did I make a mistake? Please guide. Thanks. – vepzfe Oct 07 '21 at 20:25
  • Found the correct file - this contains all states for all countries - https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces.zip. Now I need to figure out how to extract the bounding boxes for all of them. – vepzfe Oct 07 '21 at 20:35
  • @IanTurton Thanks a lot for pointing me in the right direction. I figured out the solution. – vepzfe Oct 09 '21 at 10:22

1 Answers1

1

I found the needed data for states/regions/provinces of all the countries in the Natural Earth data - https://www.naturalearthdata.com/.

However, after working with it, I realised that this data is quite old and for many countries it has states it does not have the new states or provinces (for eg, Telengana in India was established in 2014 as a new state, yet that does not exist in this data set).

The 1:110m only had country boundaries and boundaries for USA states. 1:50m had country boundaries + state boundaries for North America, Australia and some other regions but not for the whole world. The 1:10m downloads have the states for all the countries in the world. Download this file - https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces.zip

For a more updated and detailed version, use the level-1 shape file from this extract- https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_levels_shp.zip

natural_earth_1_10_states_and_provinces

This python script extracts the bounding box value for all the states for all the countries in the world and stores it in a csv file.

import fiona
import json
import numpy as np
# sudo pip3 install git+https://github.com/ulikoehler/UliEngineering.git
from UliEngineering.Math.Coordinates import BoundingBox
import polyline as pl
import csv

# https://gis.stackexchange.com/a/113808/138032 - how to read shape file
with fiona.open("ne_10m_admin_1_states_provinces/ne_10m_admin_1_states_provinces.shp") as c:

    with open('bbox_file.csv', mode='w') as bbox_file:

        bbox_writer = csv.writer(bbox_file)
        bbox_writer.writerow(['country', 'region', 'bbox'])

        for record in c:
            country = record['properties']['admin']
            region = record['properties']['name']  # state/province name
            print(region)

            # json_object = json.dumps(record['properties'], indent=4)
            # print(json_object)
            # print(record['geometry']['type'])  # All features are either "Polygon" or "Multipolygon"

            if record['geometry']['type'] == "Polygon":
                """
                Polygon always has array(array(coordinates)). The first array is always a length of 1. So just
                accessing the array(coordinates)
                """
                coordinates = record['geometry']['coordinates'][0]
            else:
                """
                A multipolygon is basically a list of polygons, so add one more layer to it, meaning
                array(array(array(coordinates))). Here, the first layer of arrays might have any number of
                polygons inside it. And then its the same as polygons where the first array always have a 
                single element. So in array(array(array(coordinates))) - the middle array element always
                has a single element.  
                
                The current format is this -  
                [ [[(lat1, lng1), (lat2, lng2)]], [[(lat3, lng3), (lat4, lng4)]] ]
                And I need to convert it to this -  
                [(lat1, lng1), (lat2, lng2), (lat3, lng3), (lat4, lng4)]
                Just one list of coordinates. So basically, to array(coordinates). Same format as polygon. 
                
                Calling sum() for the first time, makes it into this
                [ [(lat1, lng1), (lat2, lng2)], [(lat3, lng3), (lat4, lng4)] ]
                So one layer or arrays is removed. Sum needs to be called again in order to remove another layer, 
                then it becomes this
                [(lat1, lng1), (lat2, lng2), (lat3, lng3), (lat4, lng4)]
                https://stackoverflow.com/a/716489/3090120
                
                Now all the coordinates exist in a single array. 
                """
                coordinates = sum(record['geometry']['coordinates'], [])
                coordinates = sum(coordinates, [])

            # print(coordinates)  # the final single array containing all coordinates

            """
            Now the problem with these coordinates is that they are in the (longitude, latitude) format
            but I need it in the (latitude, longitude) format. So creating a new array and storing the 
            tuples in flipped format.
            """
            fixed_coordinates_array = []
            for coord_pair in coordinates:
                fixed_coordinates_array.append(coord_pair[::-1])

            # print(fixed_array)

            # https://techoverflow.net/2017/02/23/computing-bounding-box-for-a-list-of-coordinates-in-python/
            bbox = BoundingBox(np.asarray(fixed_coordinates_array))
            bbox_str = str(bbox.minx) + "," + str(bbox.miny) + "," + str(bbox.maxx) + "," + str(bbox.maxy)
            print(bbox_str)

            # coordinates to polyline - https://pypi.org/project/polyline/
            polyline = pl.encode(fixed_coordinates_array)
            # print(polyline)

            bbox_writer.writerow([country, region, bbox_str])

Additionally, if you want to add the polyline to the csv, just make these changes -

bbox_writer.writerow(['country', 'region', 'bbox', 'polyline'])
..
..
bbox_writer.writerow([country, region, bbox_str, polyline])

Here is the final csv output sorted by country and region- https://gist.github.com/kuwapa/a002b7abbeeaaa1b27fd31ac9840a5dd

If someone wants to go deeper and generate bounding box for each county in each state/province in each country, they can use the shape file exports from here - https://gadm.org/download_world.html

vepzfe
  • 4,217
  • 5
  • 26
  • 46