I found the needed data for states/regions/provinces of all the countries in the Natural Earth data - https://www.naturalearthdata.com/.
However, after working with it, I realised that this data is quite old and for many countries it has states it does not have the new states or provinces (for eg, Telengana in India was established in 2014 as a new state, yet that does not exist in this data set).
The 1:110m
only had country boundaries and boundaries for USA states. 1:50m
had country boundaries + state boundaries for North America, Australia and some other regions but not for the whole world. The 1:10m
downloads have the states for all the countries in the world. Download this file -
https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces.zip
For a more updated and detailed version, use the level-1 shape file from this extract- https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_levels_shp.zip

This python script extracts the bounding box value for all the states for all the countries in the world and stores it in a csv file.
import fiona
import json
import numpy as np
# sudo pip3 install git+https://github.com/ulikoehler/UliEngineering.git
from UliEngineering.Math.Coordinates import BoundingBox
import polyline as pl
import csv
# https://gis.stackexchange.com/a/113808/138032 - how to read shape file
with fiona.open("ne_10m_admin_1_states_provinces/ne_10m_admin_1_states_provinces.shp") as c:
with open('bbox_file.csv', mode='w') as bbox_file:
bbox_writer = csv.writer(bbox_file)
bbox_writer.writerow(['country', 'region', 'bbox'])
for record in c:
country = record['properties']['admin']
region = record['properties']['name'] # state/province name
print(region)
# json_object = json.dumps(record['properties'], indent=4)
# print(json_object)
# print(record['geometry']['type']) # All features are either "Polygon" or "Multipolygon"
if record['geometry']['type'] == "Polygon":
"""
Polygon always has array(array(coordinates)). The first array is always a length of 1. So just
accessing the array(coordinates)
"""
coordinates = record['geometry']['coordinates'][0]
else:
"""
A multipolygon is basically a list of polygons, so add one more layer to it, meaning
array(array(array(coordinates))). Here, the first layer of arrays might have any number of
polygons inside it. And then its the same as polygons where the first array always have a
single element. So in array(array(array(coordinates))) - the middle array element always
has a single element.
The current format is this -
[ [[(lat1, lng1), (lat2, lng2)]], [[(lat3, lng3), (lat4, lng4)]] ]
And I need to convert it to this -
[(lat1, lng1), (lat2, lng2), (lat3, lng3), (lat4, lng4)]
Just one list of coordinates. So basically, to array(coordinates). Same format as polygon.
Calling sum() for the first time, makes it into this
[ [(lat1, lng1), (lat2, lng2)], [(lat3, lng3), (lat4, lng4)] ]
So one layer or arrays is removed. Sum needs to be called again in order to remove another layer,
then it becomes this
[(lat1, lng1), (lat2, lng2), (lat3, lng3), (lat4, lng4)]
https://stackoverflow.com/a/716489/3090120
Now all the coordinates exist in a single array.
"""
coordinates = sum(record['geometry']['coordinates'], [])
coordinates = sum(coordinates, [])
# print(coordinates) # the final single array containing all coordinates
"""
Now the problem with these coordinates is that they are in the (longitude, latitude) format
but I need it in the (latitude, longitude) format. So creating a new array and storing the
tuples in flipped format.
"""
fixed_coordinates_array = []
for coord_pair in coordinates:
fixed_coordinates_array.append(coord_pair[::-1])
# print(fixed_array)
# https://techoverflow.net/2017/02/23/computing-bounding-box-for-a-list-of-coordinates-in-python/
bbox = BoundingBox(np.asarray(fixed_coordinates_array))
bbox_str = str(bbox.minx) + "," + str(bbox.miny) + "," + str(bbox.maxx) + "," + str(bbox.maxy)
print(bbox_str)
# coordinates to polyline - https://pypi.org/project/polyline/
polyline = pl.encode(fixed_coordinates_array)
# print(polyline)
bbox_writer.writerow([country, region, bbox_str])
Additionally, if you want to add the polyline to the csv, just make these changes -
bbox_writer.writerow(['country', 'region', 'bbox', 'polyline'])
..
..
bbox_writer.writerow([country, region, bbox_str, polyline])
Here is the final csv output sorted by country and region-
https://gist.github.com/kuwapa/a002b7abbeeaaa1b27fd31ac9840a5dd
If someone wants to go deeper and generate bounding box for each county in each state/province in each country, they can use the shape file exports from here - https://gadm.org/download_world.html