1

So I'm trying to get the binary data field in a database as a hexadecimal string. I don't know if that is exactly what I'm looking for but that is my hunch. There is a column called status in my dataframe that says [binary data] in PostgreSQL but when I execute the following command it looks like this:

df = pd.read_sql_query("""SELECT * FROM public."Vehtek_SLG" LIMIT 1000""",con=engine.connect())

Status column

How do I get the actual data in that column?

Ravaal
  • 3,233
  • 6
  • 39
  • 66
  • Please don't post images of code, data, or Tracebacks. Copy and paste it as text then format it as code (select it and type `ctrl-k`) ... [Discourage screenshots of code and/or errors](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors)...[Why not upload images of code on SO when asking a question?](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question) ... [You should not post code as an image because:...](https://meta.stackoverflow.com/a/285557/2823755) – wwii May 18 '21 at 13:47
  • [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – wwii May 18 '21 at 13:47

1 Answers1

2

It looks like your DataFrame has for each row a list of individual bytes instead of the entire hexadecimal bytes string. The Series df["status"].map(b"".join) will have the concatenated bytes strings.

import random

import pandas as pd


# Simulating lists of 10 bytes for each row
df = pd.DataFrame({
    "status": [
        [bytes([random.randint(0, 255)]) for _ in range(10)]
        for _ in range(5)
    ]
})

s = df["status"].map(b"".join)

Both objects look like:

# df
                                              status
0  [b'\xb3', b'f', b';', b'P', b'\xcb', b'\x9b', ...
1  [b'\xd2', b'\xe8', b'.', b'b', b'g', b'|', b'\...
2  [b'\xa7', b'\xe1', b'z', b'-', b'W', b'\xb8', ...
3  [b'\xc5', b'\xa9', b'\xd5', b'\xde', b'\x1d', ...
4  [b'\xa3', b'b', b')', b'\xe3', b'5', b'`', b'\...

# s
0       b'\xb3f;P\xcb\x9bi\xb0\x9e\xfd'
1            b'\xd2\xe8.bg|\x94O\x90\n'
2       b'\xa7\xe1z-W\xb8\xc2\x84\xb91'
3    b'\xc5\xa9\xd5\xde\x1d\x02*}I\x15'
4                b'\xa3b)\xe35`\x0ed#g'
Name: status, dtype: object

After coverting the status field to binary we can then use the following to make it hexadecimal.

df['status'] = s.apply(bytes.hex)

And now here is your field!

df['status'].head()

0    1f8b0800000000000400c554cd8ed33010beafb4ef6045...
1    1f8b0800000000000400c554cd8ed33010beafb4ef6045...
2    1f8b0800000000000400c554cd6e9b4010be47ca3bac50...
3    1f8b0800000000000400c554cd6e9b4010be47ca3bac50...
4    1f8b0800000000000400c554cd6e9b4010be47ca3bac50...
Ravaal
  • 3,233
  • 6
  • 39
  • 66
Roméo Després
  • 1,777
  • 2
  • 15
  • 30
  • Ok. So is `s` a hexadecimal string now? – Ravaal May 19 '21 at 12:52
  • 1
    It's a pandas [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) containing Python [`bytes`](https://docs.python.org/3/library/stdtypes.html#bytes). If you want it to contain the hexadecimal representation of those `bytes` as `str`, you can use `s.apply(bytes.hex)`. – Roméo Després May 19 '21 at 13:19