2

I had a function in Python 2 which I used to generate a 22 length random string from a UUID....

def make_base64_string():
    return uuid.uuid4().bytes.encode("base64")[:22]

I have since started to test in Python 3 and just finished watching the Pragmatic Unicode presentation, most of which went way over my head. Anyway, I didn't assume this function would now work in Python 3.4 and I was right....

So next, I tried what I was hoping was the solution giving that bytes.encode from my understating is gone... (treat everything as a byte, right?).

base64.b64encode(bytes(uuid.uuid4(), 'utf-8'))

but this gives me the following error...

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: encoding or errors without a string argument 

Why? I would like to understand this more.

Prometheus
  • 32,405
  • 54
  • 166
  • 302
  • 1
    This is what I suspect, but what I did not confirm: uuid.uuid4() does not return bytes, but `UUID` object. Thus the encoder cannot properly handle it as this object might not have default string presentation anymore. – Mikko Ohtamaa May 22 '15 at 12:32
  • You haven't given enough of a traceback for the first error to be useful – Eric May 22 '15 at 12:32
  • Some useful answers that really helped my understanding thank you. – Prometheus May 22 '15 at 12:39
  • Helpful reference: http://python-future.org/compatible_idioms.html?highlight=bytes#byte-string-literals – nu everest Aug 19 '16 at 23:42

2 Answers2

4

base64.b64encode takes bytes as argument, no need to decode-re-encode.

>>> base64.b64encode(uuid.uuid4().bytes)
b'58jvz9F7QXaulSScqus0NA=='
>>> base64.b64encode(uuid.uuid4().bytes)
b'gLV2vn/1RMSSckMd647jUg=='
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • but this would not take the actual UUID but the byte representation, right? – Prometheus May 22 '15 at 12:42
  • @OrbiterFleet yes, which is the actual UUID. Also your original code in python2 did that (and the slice at 22. byte). – Ilja Everilä May 22 '15 at 12:46
  • Yep sorry the code did do that with ``uuid.uuid4().bytes``. I understand now! – Prometheus May 22 '15 at 12:49
  • 1
    The confusion stems usually from the fact that `str` in python2 is `bytes`, aka strings in python2 are byte strings, and you have to do extra to get unicode, like `u"stuff"` literals and `unicode(stuff)` calls. It takes some practice to understand the difference, but it's worth it, at least in my opinion :P (since we have to handle weird chars like ä and å all the time). – Ilja Everilä May 22 '15 at 12:54
  • Last question and I'm making this as accepted. Did I also misunderstand what ``uuid.uuid4()`` returns, I assumed it would return a string (or byte string as you said) however the function returns i.e. ``UUID('c75e94b7-964e-4d2f-ae18-be712f13f968')`` – Prometheus May 22 '15 at 12:57
  • 1
    Yes, it is an UUID object wrapping the actual byte representation, like @Mohammadhzp points out. It offers helpful methods for inspecting and using the value. – Ilja Everilä May 22 '15 at 12:59
  • cool so if I wanted the actual UUID string I do ``str(uuid.uuid4())`` or `` the_edcoded.decode("utf-8")``. byte string I do ``uuid.uuid4().bytes``. and b64encode only takes ``byte strings``. think that all makes sense now. – Prometheus May 22 '15 at 13:02
1
>>> type(uuid.uuid4())
<class 'uuid.UUID'>

This shows that uuid.uuid4() is not a string and you need to pass an string to bytes, so here is the working code:

base64.b64encode(bytes(str(uuid.uuid4()), 'utf-8'))
Mohammadhzp
  • 498
  • 7
  • 20
  • OK good that makes sense. But why do we need ``bytes(`` if as in @Ilja answer b64encode takes bytes as argument. I would have guess that ``base64.b64encode(str(uuid.uuid4()), 'utf-8')`` should have worked but I get the error ``'str' does not support the buffer interface`` – Prometheus May 22 '15 at 12:47
  • Your question was about why this code is not working and I want to understand it more, I answered it's because you are passing an object instead of a string to bytes(), the other answer is certainly better but my answer is the correct reply for your question,base64::b64encode() expect parameter one to be bytes not a string and does not take utf-8 for second parameter so base64.b64encode(str(uuid.uuid4()), 'utf-8') must not work – Mohammadhzp May 22 '15 at 13:04
  • I voted this answer up, not sue why others voted it down. maybe someone else could reply I'm interested also. – Prometheus May 22 '15 at 13:11
  • the second part of your answer tho takes a string uuid and gives a different result to mine using ``uuid.uuid4().bytes``. I could be wrong but its taking the string value and retuning that into a byte string, right? – Prometheus May 22 '15 at 13:15
  • 1
    No, byte string is different, in this case, bytes() knows this is a string which turns into bytes, bytes() is useful for ASCII characters which means it's ok in here, you get different result because byte order in uuid.UUID().bytes is big-endian(my best guess) – Mohammadhzp May 22 '15 at 14:26