Replace HTML-special-character-codes in Python3

Question

I recieve HTML-files and they contain Strings like that " ("), ü(ü) and so on.

I need them humand-readable. So I could use str.replace() for that. But isn't there a package/library for Python3 which knows all character-codes by itself and could handle that?

score 3 · Accepted Answer · answered Jul 17 '17 at 10:57

3

You can use html.unescape():

import html
print(html.unescape('&quot;&#252;'))

answered Jul 17 '17 at 10:57

u32i64

2,384
3
22
36

score 1 · Answer 2 · answered Jul 17 '17 at 10:59

1

Se the solution here. It's called decode (or unescape) and yes there is a library for that.

answered Jul 17 '17 at 10:59

dben

484
1
6
21

2

No links without explanations please. – buhtz Jul 17 '17 at 11:04
1

Someone asked almost the same question as you. You can find the explanation there. – dben Jul 17 '17 at 11:15
In that case you could __flag__ the queston as a duplicate. I did this for my self now. ;) – buhtz Jul 17 '17 at 11:20

Replace HTML-special-character-codes in Python3

2 Answers2