I'm hoping someone can relieve me of my ignorance here: I'm using python 3.6.4 currently and I'm trying to convert strings to simple alphanumerics.
I've got the how mostly sorted until I get to characters with diacritics. It involves football team names so I'm looking to convert, by way of example, 1. FC Köln
to 1fckoln
. So:
import requests
c = requests.get(the_url)
content = c.text
#code here to extract team name into variable 'ht'
ht = simpname(ht)
def simpname(who):
punct = "' .-/\°()"
the_o = 'òóôõöÖøØ'
for p in punct:
if p in who:
who = who.replace(p, '')
if the_o in who:
who = who.replace(the_o, 'o')
who = who.lower()
return who
(NB: code cut down for the example, I'm handling a, e, etc. in the same fashion)
The only problem here is that, in my example, the text is arriving as 1. FC Köln
. I know I've got a character encoding issue, but I can't seem to get it to the right state. Can someone suggest a way around my issue?
Solved! Thank you to @Idlehands and the commenters below for their advice. Below is the same code with the updates applied for future readers can see the difference.
import requests
incoming = requests.get(the_url)
cinput = incoming.content
cinput = cinput.decode('iso-8859-1')
cinput = str(cinput)
# more code, eventually extracts a team name under 'ht'
ht = simpname(ht)
...
def simpname(who):
punct = "' .-/\°()"
the_o = 'òóôõöÖøØ'
# who is currently 1. FC Köln
who = who.encode('latin-1') # who becomes b'1. FC K\xc3\xb6ln'
who = who.decode('utf-8') # who becomes '1. FC Köln'
for p in punct:
if p in who:
who = who.replace(p, '')
for an_o in the_o:
if an_o in who:
who = who.replace(an_o, 'o')
who = who.lower()