1

Im trying to extract all the countries in a string using Geotext

It works fine for a few sentences but doesnt work for some.

Ive tried to do it in Python 3.6.

s="India Vs Ireland T20 Series"
s=GeoText(s)
s.countries

Expected Results:

['India','Ireland']

Actual Result:

['Ireland']
Wolfgang Fahl
  • 15,016
  • 11
  • 93
  • 186

1 Answers1

0

you could use pycountry for your task (it also works with python 3):

pip install pycountry

import pycountry
text = "United States (New York), United Kingdom (London)"
for country in pycountry.countries:
    # Handle both the cases(Uppercase/Lowercase)
    if str(country.name).lower() in str(text).lower():
        print country.name
Anonymous
  • 659
  • 6
  • 16
  • Alright. Ill try this. but do you know why geotext is so inconsistent? I mean when i try the same approach on a different string for eg s='India is my country" st=GeoText(s) st.countries it returns ['India'] – Akshay Sreekant Jan 24 '19 at 12:24
  • 1
    The regex geotext is using to parse your string depends heavily on capitalization. The 'candidate' strings it tries to match with known location names from your string are "India Vs", "Ireland", and "Series", because it thinks the "Vs" might be part of the location name", and isn't smart enough to try just "Ireland". – Albert Feb 02 '19 at 03:03