0

How can I get the UTF-8 code Unicode code point 1 of any single character in Python or in the shell?

I’d like to have (see here for distinguishing between "plus" and "full plus" signs):

getUTF8('+')
> U+FF0B
getUTF8('+')
> U+002B

1 Correct terminology, as per the comments.

kotchwane
  • 2,082
  • 1
  • 19
  • 24
  • NB. the format is a bit different in the duplicate but the logic is the same, please let me know if you think this is not a duplicate – mozway Jan 06 '22 at 10:59
  • The point is the format !  It might be difficult to relate `b'\\u3232'`with its UTF-code syntax. I think this isn’t a duplicate. – kotchwane Jan 06 '22 at 11:03
  • @kotchwane reopened, but what is you problem since you provided a solution? – mozway Jan 06 '22 at 11:20
  • 5
    Terminology notice: It seems that whatever you are looking for, it is not UTF-8, which is a variable-length encoding. For example, the "full plus" sign would be 3 bytes (EF BC 8B) in [UTF-8](https://en.wikipedia.org/wiki/UTF-8). – Ture Pålsson Jan 06 '22 at 11:44

2 Answers2

3

Using bash, zsh or ksh93 with a UTF-8 aware locale:

$ printf "U+%04X\n" "'+" "'+"
U+FF0B
U+002B

When their builtin versions of printf(1) see a numeric format specifier (Like %X), and the first character of the relevant argument (After the usual shell wordsplitting and parsing) is a double or single quote, the next character's codepoint value is taken as the argument, instead of the character itself.

Shawn
  • 47,241
  • 3
  • 26
  • 60
2

Here’s a Python version:

def code_point(c):
   return "U+{:04X}".format(ord(c))

With above example:

for c in ['+', '+']:
   print(code_point(c))

> U+FF0B
> U+002B

kotchwane
  • 2,082
  • 1
  • 19
  • 24
  • 2
    That's not 'the UTF-8 code', that's the unicode code point! – Sören Jan 06 '22 at 12:05
  • 2
    You can also use just `"U+{:04X}".format(ord(c))`. Note the X instead of x. – Shawn Jan 06 '22 at 12:51
  • @Sören : Right, but note that the OP meanwhile changed his mind and wants to have a code point. – user1934428 Jan 07 '22 at 08:54
  • @user1934428 I didn’t change my mind, I just wasn’t familiar with the correct terminology. I think the examples given in my question showed without ambiguity what I was trying to achieve. – kotchwane Jan 07 '22 at 09:25