How to Convert a Path with Special Characters to work in a Package

Question

I'm able to get some information from a website with Python and BeautifulSoup. However I get an error when I have a path with a special character.

In the Italian language we have some special characters such as à, è, ì, ò and ù. If I manually set a, e, i, o and u parsing works. However if I use BeautifulSoup and parse it automatically I get an error. Do you know how can I convert these characters into simple vowels?

I put the following settings at the beginning of my code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

Are you looking [to strip diacritics](http://stackoverflow.com/q/517923/364696)? The various accent marks you're talking about are diacritics, it's just unclear if that's the goal. — ShadowRanger, Dec 08 '16 at 22:47

AER · Answer 1 · 2016-12-10T05:29:13.957

0

Use the package unidecode. I've given a code sample below on how to use this:

from unidecode import unidecode as ud
italian_string = "L'italiano è classificato al 21º"
ud(italian_string)

The last line returns:

=> "L'italiano e classificato al 21o"

edited Dec 10 '16 at 05:29

answered Dec 09 '16 at 02:18

AER

1,549
19
37

Well the problem is that I do web scraping. Lecter è was returned in this way: "Ã©". If I use your system that characters will become "A(c)" – all_key_the Dec 09 '16 at 09:42
Work perfectly on this: https://repl.it/languages/python3 . What is the string encoded as? – AER Dec 10 '16 at 05:28
@cco So what I have to use instead? – all_key_the Dec 14 '16 at 12:21
Please put up a complete example of what you're trying to do - when you say 'path', do you mean the path to a file, a path to an element in the document, or the trailing components of a URL? each of these would have a different answer (and some could have more than one). Showing what you've tried and what you want to do will be a big help. – cco Dec 14 '16 at 23:12

How to Convert a Path with Special Characters to work in a Package

1 Answers1