Questions tagged [python-module-unicodedata]

15 questions
22
votes
1 answer

Why doesn't unicodedata recognise certain characters?

In Python 2.7 at least, unicodedata.name() doesn't recognise certain characters. >>> from unicodedata import name >>> name(u'\n') Traceback (most recent call last): File "", line 1, in ValueError: no such name >>> name(u'a') 'LATIN…
Hammerite
  • 21,755
  • 6
  • 70
  • 91
3
votes
1 answer

Determine if a unicode character exists in a unicode subset

I'd like to find a way to determine if a Unicode character exists in a standardized subset of Unicode characters, specifically Latin basic and Latin-1. I am using Python 2 and the unicodedata module but need a solution that works in 3 as well…
3
votes
1 answer

Python convert this utf8 string to latin1

I have this UTF-8 string: s = "Naděždaüäö" Which I'd like to convert to a UTF-8 string which can be encoded in "latin-1" without throwing an exception. I'd like to do so by replacing every character which cannot be found in latin-1 by its closest…
Dominik Neise
  • 1,179
  • 1
  • 10
  • 23
3
votes
1 answer

What is the difference between unicodedata.digit and unicodedata.numeric?

From unicodedata doc: unicodedata.digit(chr[, default]) Returns the digit value assigned to the character chr as integer. If no such value is defined, default is returned, or, if not given, ValueError is raised. unicodedata.numeric(chr[,…
user1785721
2
votes
1 answer

What are the differences between the modules unicode and unicodedata?

I have a large dataset with over 2 million rows of textual data. Now I want to remove the accents from the strings. In the link below, two different modules are described to remove the accents: What is the best way to remove accents in a Python…
Emil
  • 1,531
  • 3
  • 22
  • 47
2
votes
1 answer

Get a list of all Greek unicode characters

I would like to know how to obtain a list of all Greek characters (upper and lowercase letters). I know how to find specific characters (unicodedata.lookup(name)), but I want all upper and lowercase letters. Is there any way to do this?
1
vote
0 answers

UnicodeEncodeError printing Hangul characters in the terminal

This application runs on a mac only and I'm stuck with Python 2. I have an input string '한글' which when decoded through an online unicode converter shows as \u1112\u1161\u11ab\u1100\u1173\u11af For my application to work, I need to convert this…
Lewis
  • 41
  • 6
1
vote
2 answers

Remove special characters from string such as smileys but keep german special charactes

I know how to remove unwanted charactes in a string, like smileys etc. However, some languages like german have special charactes, too. This is my current code: import unicodedata string = "süß " uni_str = str(unicodedata.normalize('NFKD', \ …
Kev1n91
  • 3,553
  • 8
  • 46
  • 96
0
votes
1 answer

More efficient way to replace special chars with their unicode name in pandas df

I have a large pandas dataframe and would like to perform a thorough text cleaning on it. For this, I have crafted the below code that evaluates if a character is either an emoji, number, Roman number, or a currency symbol, and replaces these with…
lazarea
  • 1,129
  • 14
  • 43
0
votes
2 answers

Capture output including control characters of subprocess

I have the following simple program to run a subprocess and tee its output to both stdout and some buffer import subprocess import sys import time import unicodedata p = subprocess.Popen( "top", shell=True, stdout=subprocess.PIPE, …
Mugen
  • 8,301
  • 10
  • 62
  • 140
0
votes
1 answer

Convert check mark in Python

I have a dataframe which has, in a certain column, a check mark (unicode: '\u2714'). I have been trying to replace it with the following coomand: import unicodedata df['Column'].str.replace(unicodedata.lookup("\u2714"), '') But, i keep on reading…
bellotto
  • 445
  • 3
  • 13
0
votes
1 answer

Understanding unistr of unicodedata.normalize()

Wikipedia basically says the following for the four values of unistr. - NFC (Normalization Form Canonical Composition) - Characters are decomposed - then recomposed by canonical equivalence. - NFKC (Normalization Form Compatibility…
user1424739
  • 11,937
  • 17
  • 63
  • 152
0
votes
3 answers

How to remove every possible accents from a column in python

I am new in python. I have a data frame with a column, named 'Name'. The column contains different type of accents. I am trying to remove those accents. For example, rubén => ruben, zuñiga=zuniga, etc. I wrote following code: import numpy as…
user3642360
  • 762
  • 10
  • 23
-1
votes
1 answer

C++ implementation of python unicodedata library

New user here, please be gentle. we are looking to implement a piece of python code in c++, but it involves some intricate unicode library called unicodedata, in particular this function unicodedata.category('A') # 'L'etter, 'u'ppercase 'Lu' Any…
John Jiang
  • 827
  • 1
  • 9
  • 19
-1
votes
1 answer

how to return values from map function on dataframe

I am trying to return values from map function but instead it gives me the memory address. I tried using list, but then it gives me an error stating str object doesn't have an attribute decode. Is there a way out?