Questions tagged [python-module-unicodedata]
15 questions
22
votes
1 answer
Why doesn't unicodedata recognise certain characters?
In Python 2.7 at least, unicodedata.name() doesn't recognise certain characters.
>>> from unicodedata import name
>>> name(u'\n')
Traceback (most recent call last):
File "", line 1, in
ValueError: no such name
>>> name(u'a')
'LATIN…

Hammerite
- 21,755
- 6
- 70
- 91
3
votes
1 answer
Determine if a unicode character exists in a unicode subset
I'd like to find a way to determine if a Unicode character exists in a standardized subset of Unicode characters, specifically Latin basic and Latin-1. I am using Python 2 and the unicodedata module but need a solution that works in 3 as well…

rustinpeace91
- 89
- 2
- 8
3
votes
1 answer
Python convert this utf8 string to latin1
I have this UTF-8 string:
s = "Naděždaüäö"
Which I'd like to convert to a UTF-8 string which can be encoded in "latin-1" without throwing an exception. I'd like to do so by replacing every character which cannot be found in latin-1 by its closest…

Dominik Neise
- 1,179
- 1
- 10
- 23
3
votes
1 answer
What is the difference between unicodedata.digit and unicodedata.numeric?
From unicodedata doc:
unicodedata.digit(chr[, default]) Returns the digit value assigned to
the character chr as integer. If no such value is defined, default is
returned, or, if not given, ValueError is raised.
unicodedata.numeric(chr[,…
user1785721
2
votes
1 answer
What are the differences between the modules unicode and unicodedata?
I have a large dataset with over 2 million rows of textual data. Now I want to remove the accents from the strings.
In the link below, two different modules are described to remove the accents:
What is the best way to remove accents in a Python…

Emil
- 1,531
- 3
- 22
- 47
2
votes
1 answer
Get a list of all Greek unicode characters
I would like to know how to obtain a list of all Greek characters (upper and lowercase letters). I know how to find specific characters (unicodedata.lookup(name)), but I want all upper and lowercase letters.
Is there any way to do this?

Microlith57
- 45
- 8
1
vote
0 answers
UnicodeEncodeError printing Hangul characters in the terminal
This application runs on a mac only and I'm stuck with Python 2.
I have an input string '한글' which when decoded through an online unicode converter shows as \u1112\u1161\u11ab\u1100\u1173\u11af
For my application to work, I need to convert this…

Lewis
- 41
- 6
1
vote
2 answers
Remove special characters from string such as smileys but keep german special charactes
I know how to remove unwanted charactes in a string, like smileys etc. However, some languages like german have special charactes, too.
This is my current code:
import unicodedata
string = "süß "
uni_str = str(unicodedata.normalize('NFKD', \
…

Kev1n91
- 3,553
- 8
- 46
- 96
0
votes
1 answer
More efficient way to replace special chars with their unicode name in pandas df
I have a large pandas dataframe and would like to perform a thorough text cleaning on it. For this, I have crafted the below code that evaluates if a character is either an emoji, number, Roman number, or a currency symbol, and replaces these with…

lazarea
- 1,129
- 14
- 43
0
votes
2 answers
Capture output including control characters of subprocess
I have the following simple program to run a subprocess and tee its output to both stdout and some buffer
import subprocess
import sys
import time
import unicodedata
p = subprocess.Popen(
"top",
shell=True,
stdout=subprocess.PIPE,
…

Mugen
- 8,301
- 10
- 62
- 140
0
votes
1 answer
Convert check mark in Python
I have a dataframe which has, in a certain column, a check mark (unicode: '\u2714'). I have been trying to replace it with the following coomand:
import unicodedata
df['Column'].str.replace(unicodedata.lookup("\u2714"), '')
But, i keep on reading…

bellotto
- 445
- 3
- 13
0
votes
1 answer
Understanding unistr of unicodedata.normalize()
Wikipedia basically says the following for the four values of unistr.
- NFC (Normalization Form Canonical Composition)
- Characters are decomposed
- then recomposed by canonical equivalence.
- NFKC (Normalization Form Compatibility…

user1424739
- 11,937
- 17
- 63
- 152
0
votes
3 answers
How to remove every possible accents from a column in python
I am new in python. I have a data frame with a column, named 'Name'. The column contains different type of accents. I am trying to remove those accents. For example, rubén => ruben, zuñiga=zuniga, etc. I wrote following code:
import numpy as…

user3642360
- 762
- 10
- 23
-1
votes
1 answer
C++ implementation of python unicodedata library
New user here, please be gentle.
we are looking to implement a piece of python code in c++, but it involves some intricate unicode library called unicodedata, in particular this function
unicodedata.category('A') # 'L'etter, 'u'ppercase
'Lu'
Any…

John Jiang
- 827
- 1
- 9
- 19
-1
votes
1 answer
how to return values from map function on dataframe
I am trying to return values from map function but instead it gives me the memory address. I tried using list, but then it gives me an error stating str object doesn't have an attribute decode. Is there a way out?

via2
- 9
- 3