0

I have a column in my pandas dataframe which stores bytes. I believe the bytes are getting converted to a string when I put it in the dataframe because dataframe doesn't support actual bytes as a dtype. So instead of the column values being b'1a2b', it ends up getting wrapped in a string like this: "b'1a2b'".

I'm passing these values into a method that expects bytes. When I pass it like this ParseFromString("b'1a2b'"), I get the error message:

TypeError: memoryview: a bytes-like object is required, not 'str'

I was confused if encode or decode works in this case or if there is some other way to convert this wrapped bytes into bytes? (I'm using Python 3)

Since these values are in a dataframe, I can use a helper method during the conversion process from string-->bytes--> protocol buffer since the actual dataframe might not be able to store it as bytes. For example, my_dataframe.apply(_helper_method_convert_string_to_bytes_to_protobuf).

florentine
  • 607
  • 1
  • 5
  • 12
  • Could you store the value as `1a2b` and then convert it to bytes with `'1a2b'.encode()` when you call the function? – PacketLoss Oct 21 '20 at 01:02
  • @PacketLoss How would you go from b'1a2b' to "1a2b"? The example b'1a2b' is an output of SerializeAsString method in the link above. – florentine Oct 21 '20 at 01:37

2 Answers2

1

So the problem seems to be that you are unable to extract the byte object from the string. When you pass the string to the function, which is expecting a byte object like b'1a2b', it throws an error. My suggestion would be to try wrapping your string in an eval function. Like:

a = "b'1a2b'"
b = eval(a)

b is what you want. You haven't shared the code for your function, so I'm unable to do amend the actual code for you.

Dharman
  • 30,962
  • 25
  • 85
  • 135
annicheez
  • 187
  • 5
  • https://stackoverflow.com/a/1832957/7942856 using `eval` is considered bad practice. – PacketLoss Oct 21 '20 at 01:43
  • @PacketLoss thanks for sharing this. I agree it is slow; however, the post overstates the case against ```eval``` judging from the arguments provided by others. I'm sure there's a more efficient solution than mine. – annicheez Oct 21 '20 at 01:48
0

You can take a few approaches here, noting that eval() is considered bad practice and it is best to avoid this where possible.

  1. Store your byte representation as a string and encode() on call to function
  2. Extract the byte representation out of your string, then call encode() to function

Whilst if possible, it would be best to just store your bytes as 1a2b when importing the data, if that's not possible you could use regex to extract the contents of the string between b'' and pass the result to encode().

import re

string = "b'1a2b'"

re.search(r"(?<=').*(?=')", string).group().encode()

Output:

#b'1a2b'
type(re.search(r"(?<=').*(?=')", string).group().encode())
#<class 'bytes'>
PacketLoss
  • 5,561
  • 1
  • 9
  • 27