I'm trying to convert a Pandas Dataframe into a list, which works but I have some issues with the encoding. I hope someone can give me advice on how to handle this problem. Right now, I'm using Python 2.7.
I'm loading an excel file and it loads correctly.
I'm using following code and I get following output:
germanStatesExcelFile='German_States.xlsx'
ePath_german_states=(os.path.dirname(__file__))+'/'+germanStatesExcelFile
german_states = pd.read_excel(ePath_german_states)
print("doc " + str(german_states))
Output:
states
0 baden-württemberg
1 bayern
2 hessen
3 rheinland-pfalz
4 saarland
5 nordrhein-westfalen
The next step is converting this Dataframe into a list, which I do with following code:
german_states = german_states['states'].tolist()
Output:
[u'baden-w\xfcrttemberg', u'bayern', u'hessen', u'rheinland-pfalz', u'saarland', u'nordrhein-westfalen']
It seems like the list is converting utf-8 not right. so i tried following step:
german_states = [x.encode('utf-8') for x in german_states]
Output:
['baden-w\xc3\xbcrttemberg', 'bayern', 'hessen', 'rheinland-pfalz', 'saarland', 'nordrhein-westfalen']
I would like to have following Output:
['baden-württemberg', 'bayern', 'hessen', 'rheinland-pfalz', 'saarland', 'nordrhein-westfalen']