Python 3 encoding or errors without a string argument

Question

I had a function in Python 2 which I used to generate a 22 length random string from a UUID....

def make_base64_string():
    return uuid.uuid4().bytes.encode("base64")[:22]

I have since started to test in Python 3 and just finished watching the Pragmatic Unicode presentation, most of which went way over my head. Anyway, I didn't assume this function would now work in Python 3.4 and I was right....

So next, I tried what I was hoping was the solution giving that bytes.encode from my understating is gone... (treat everything as a byte, right?).

base64.b64encode(bytes(uuid.uuid4(), 'utf-8'))

but this gives me the following error...

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: encoding or errors without a string argument

Why? I would like to understand this more.

This is what I suspect, but what I did not confirm: uuid.uuid4() does not return bytes, but `UUID` object. Thus the encoder cannot properly handle it as this object might not have default string presentation anymore. — Mikko Ohtamaa, May 22 '15 at 12:32
You haven't given enough of a traceback for the first error to be useful — Eric, May 22 '15 at 12:32
Some useful answers that really helped my understanding thank you. — Prometheus, May 22 '15 at 12:39
Helpful reference: http://python-future.org/compatible_idioms.html?highlight=bytes#byte-string-literals — nu everest, Aug 19 '16 at 23:42

score 4 · Accepted Answer · answered May 22 '15 at 12:33

4

base64.b64encode takes bytes as argument, no need to decode-re-encode.

>>> base64.b64encode(uuid.uuid4().bytes)
b'58jvz9F7QXaulSScqus0NA=='
>>> base64.b64encode(uuid.uuid4().bytes)
b'gLV2vn/1RMSSckMd647jUg=='

answered May 22 '15 at 12:33

Ilja Everilä

50,538
7
126
127

but this would not take the actual UUID but the byte representation, right? – Prometheus May 22 '15 at 12:42
@OrbiterFleet yes, which is the actual UUID. Also your original code in python2 did that (and the slice at 22. byte). – Ilja Everilä May 22 '15 at 12:46
Yep sorry the code did do that with ``uuid.uuid4().bytes``. I understand now! – Prometheus May 22 '15 at 12:49
1

The confusion stems usually from the fact that `str` in python2 is `bytes`, aka strings in python2 are byte strings, and you have to do extra to get unicode, like `u"stuff"` literals and `unicode(stuff)` calls. It takes some practice to understand the difference, but it's worth it, at least in my opinion :P (since we have to handle weird chars like ä and å all the time). – Ilja Everilä May 22 '15 at 12:54
Last question and I'm making this as accepted. Did I also misunderstand what ``uuid.uuid4()`` returns, I assumed it would return a string (or byte string as you said) however the function returns i.e. ``UUID('c75e94b7-964e-4d2f-ae18-be712f13f968')`` – Prometheus May 22 '15 at 12:57
1

Yes, it is an UUID object wrapping the actual byte representation, like @Mohammadhzp points out. It offers helpful methods for inspecting and using the value. – Ilja Everilä May 22 '15 at 12:59
cool so if I wanted the actual UUID string I do ``str(uuid.uuid4())`` or `` the_edcoded.decode("utf-8")``. byte string I do ``uuid.uuid4().bytes``. and b64encode only takes ``byte strings``. think that all makes sense now. – Prometheus May 22 '15 at 13:02

score 1 · Answer 2 · answered May 22 '15 at 12:33

1

>>> type(uuid.uuid4())
<class 'uuid.UUID'>

This shows that uuid.uuid4() is not a string and you need to pass an string to bytes, so here is the working code:

base64.b64encode(bytes(str(uuid.uuid4()), 'utf-8'))

answered May 22 '15 at 12:33

Mohammadhzp

498
7
20

OK good that makes sense. But why do we need ``bytes(`` if as in @Ilja answer b64encode takes bytes as argument. I would have guess that ``base64.b64encode(str(uuid.uuid4()), 'utf-8')`` should have worked but I get the error ``'str' does not support the buffer interface`` – Prometheus May 22 '15 at 12:47
Your question was about why this code is not working and I want to understand it more, I answered it's because you are passing an object instead of a string to bytes(), the other answer is certainly better but my answer is the correct reply for your question,base64::b64encode() expect parameter one to be bytes not a string and does not take utf-8 for second parameter so base64.b64encode(str(uuid.uuid4()), 'utf-8') must not work – Mohammadhzp May 22 '15 at 13:04
I voted this answer up, not sue why others voted it down. maybe someone else could reply I'm interested also. – Prometheus May 22 '15 at 13:11
the second part of your answer tho takes a string uuid and gives a different result to mine using ``uuid.uuid4().bytes``. I could be wrong but its taking the string value and retuning that into a byte string, right? – Prometheus May 22 '15 at 13:15
1

No, byte string is different, in this case, bytes() knows this is a string which turns into bytes, bytes() is useful for ASCII characters which means it's ok in here, you get different result because byte order in uuid.UUID().bytes is big-endian(my best guess) – Mohammadhzp May 22 '15 at 14:26

Python 3 encoding or errors without a string argument

2 Answers2