2

I have two lists of city and country names, and I would like to check which city belong to which country. What is the easiest way to achieve that in python?

Please note that I have used till now GeoText to extract city and country names from a test but it doesn't tell me which city belongs to which country.

The problem can't be solved manually because the lists are long.

E.G.

country_list = ['china', 'india', 'canada', 'america', ...]
city_list = ['Mocoa', 'March', 'San Miguel', 'Neiva', 'Naranjito', 'San Fernando',
             'Alliance', 'Progreso', 'NewYork', 'Toronto', ...]
martineau
  • 119,623
  • 25
  • 170
  • 301
I. A
  • 2,252
  • 26
  • 65
  • 2
    City names are not unique to countries. E.g. Paris, France and Paris, Texas, US. – Barmar Jan 12 '21 at 17:03
  • You need a database or dictionary that lists the relationships between cities and countries. You can't do what you want with just the two lists you show. – Barmar Jan 12 '21 at 17:05
  • Make each country a `set` of cities which will allow you to write `if city in country: …` and it will execute very fast. – martineau Jan 12 '21 at 17:40
  • You are trying to recover the context of _which city belongs to which country_. However, while _city-to-country_ is not a unique one-to-one mapping, you also have lost the context while using GeoText. You need a way of extracting a city-to-country mapping somehow. Your other option is (if you are really out of options) to create a many-to-many mapping: which means if a city CT1 belongs to countries (CN1, CN2, CN4), then you provide that information as well. – CypherX Jan 12 '21 at 18:32

2 Answers2

3

you can try this code

import requests
import re

city_list = ['Jerusalem', 'Tel-Aviv', 'New York', 'London', 'Madrid', 'Alliance',
             'Mocoa', 'March', 'San Miguel', 'Neiva', 'Naranjito', 'San Fernando',
             'Alliance', 'Progreso', 'NewYork', 'Toronto']
city_country_dict = {}
country_city_dict = {}
for city in city_list:
    response = requests.request("GET", f"https://www.geonames.org/search.html?q={city}&country=")
    country = re.findall("/countries.*\.html", response.text)[0].strip(".html").split("/")[-1]
    if country not in country_city_dict:
        country_city_dict[country] = [city]
    else:
        country_city_dict[country].append(city)
    city_country_dict[city] = country

this code make request to geoname with city name and than search for the first link to country, you can change this and use beautifulsoup to make it more elegant. if you run this code on large list notice that it takes time because he wait for response from geoname!

example output:

city_country_dict = {'Jerusalem': 'israe', 'Tel-Aviv': 'israe', 'New York': 'united-states', 'London': 'united-kingdo', 'Madrid': 'spain', 'Alliance': 'united-states', 'Mocoa': 'colombia', 'March': 'switzerland', 'San Miguel': 'el-salvador', 'Neiva': 'colombia', 'Naranjito': 'puerto-rico', 'San Fernando': 'trinidad-and-tobago', 'Progreso': 'honduras', 'NewYork': 'united-kingdo', 'Toronto': 'canada'}


country_city_dict = {'israe': ['Jerusalem', 'Tel-Aviv'], 'united-states': ['New York', 'Alliance', 'Alliance'], 'united-kingdo': ['London', 'NewYork'], 'spain': ['Madrid'], 'colombia': ['Mocoa', 'Neiva'], 'switzerland': ['March'], 'el-salvador': ['San Miguel'], 'puerto-rico': ['Naranjito'], 'trinidad-and-tobago': ['San Fernando'], 'honduras': ['Progreso'], 'canada': ['Toronto']}
jonathan
  • 269
  • 1
  • 7
0

You can prepare a python script that will fetch the city info via one of the free APIs. One of the options that I recommend is https://tequila.kiwi.com provided by Kiwi.com for free. You can and query their Locations API with 'term' parameter, which will give you the full details of the city that has the highest rank, based on search volume. One of the parameters of the returned database entry is the country.

Coeus
  • 1
  • 2