2

I am reading text containing emojis from a Postgres 13 database. Turns out that my Python/psycopg query does not decode/return the text as I would expect.

Via Postgres psql client

  • Within postgres:13 container

    select description from profile WHERE id = 123

Result is as expected!

️‍ and ‍‍⚕️

Via Python 3.9 with psycopg3 adapter

  • Within a python:3.9 container
  • same query as above

The result is not correctly retuned - fails to combine

>>> cur = conn.cursor()
>>> profiles = cur.fetchone()
>>> profiles[0]

️\u200d and \u200d\u200d⚕️

Connection object says it is using utf-8

>>> conn.info.encoding
'utf-8'

What am I missing here?

Any idea what I should be looking for?

Many thanks for your thoughts in advance, much appreciated! Eu

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
smile2day
  • 1,585
  • 1
  • 24
  • 34
  • Could You display response from database as hex (`encode('utf-32')`)? – Michas Jan 12 '22 at 14:14
  • Hi @Michas many thanks - txt.encode('utf-32') b'\xff\xfe\x00\x00\xf3\xf3\x01\x00\x0f\xfe\x00\x00\r \x00\x00\x08\xf3\x01\x00 \x00\x00\x00a\x00\x00\x00n\x00\x00\x00d\x00\x00\x00 \x00\x00\x00i\xf4\x01\x00\r \x00\x00\xbb\xf4\x01\x00i\xf4\x01\x00\r \x00\x00 – smile2day Jan 12 '22 at 14:55
  • Hi @LaurenzAlbe many thanks - this is a possibility. A reason why I mentioned it was within the python container. I will run more tests (save data to host) and compare with psql client output there – smile2day Jan 12 '22 at 15:03
  • 1
    https://emojipedia.org/emoji-sequence/ – JosefZ Jan 12 '22 at 15:15
  • Hi @LaurenzAlbe, just as you said it was the shell in the python:3.9 container. I saved the text to a file and viewed it in the host shell. All mashed-up emojis displayed correctly. Happy for you to add your answer here and I shall accept it. Many thanks :) – smile2day Jan 12 '22 at 15:18

2 Answers2

1

The emojis consist of several characters, and the shell you are using does not know how to display them in the way you want.

JosefZ posted a link for more details: https://emojipedia.org/emoji-sequence/

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
1

I have analysed string by raw bytes (provided in comment). The profiles[0] variable holds the valid string. The problem is only in the way the string is displayed, the more advanced emojis (using ZWJ \u200d) are not supported.

To get preview, You can dump the content to a text file and open in in a web browser.

Michas
  • 8,534
  • 6
  • 38
  • 62