1

Intro

Hi, I'm doing some fun cryptoanalysis exercises from cryptopals, and I have now encountered an 'issue', that also has happened earlier, and I really don't understand.

currently, I have a ciphertext that I read from a file in the following way:

with open("6.txt", 'r') as infile:
        b64_encoded = infile.readlines()

ciphertext = b64decode('\n'.join([x.strip() for x in b64_encoded if x != ""]))

It's now a bytes objec, and looks like this when printed (this is just an excerpt):

b'\x1dB\x1fM\x0b\x0f\x02\x1fO\x13N<\x1aie\x1fI\x1c\x0eN\x13\x01\x0b\x07N\x1b\x01\x16E6\x00\x1e\x01Id T\x1d\x1dC3SNeR\x06\x00GT\x1c\rEM\x07\x04\x0cS\x12<\x0c\x1e\x08I\x1a\t\x11O\x14L!\x1aG+\x00\x05\x1dGY\x11\x04\t\x00d&\x07S\x007\x16\x06\x0c\x1a\x17A\x1d\x01RT0_\x00 \x13\n\x05GO\x12H\x08ENe>\x16\t8E\x06\x05\x08\x1aF\x07O\x1fYx~jb6\x0c\x1d\x0fA\rH\x06U\x1a\x1b\x00\x1dBt\x04\x1e\x01I\x1a\t\x11\x02Rz\x7fI\x00H:\x00\x1a\x13I\x1aOEH\x0f\x1d\rS\x04:\x01R\x19\x01\x0bA\x13\x06\x00L1_Sb\x15\x06\x07\t\x07T\x0b\x17A\x14\x16Iy35\x0b\x1b\x01\x05\x0fF\x07O\x1dNxNH\'R\x04\x07\x0cEXH\x08A\x00O T\x08t\x0b\x1d\x19I\x02\x00\x0e\x16\\\x00R0ie\x1fI\x02\x02T\x00\x01\x0b\x07N\x02\x10S\x01&\x10\x15M\x02\x07\x02\x1fO\x1bNx0i6R\n\x01\tT\x06\x07\tSN\x02\x10S\x08;\x10\x06\x05I\x0f\x0f\x10O;\x00:_G+\x1cId3OT\x02\ (...)

context

I have gotten to a point in the exercise, where I need to transpose this ciphertext, in accordance with a certain keylength that I know k, such that I can get a collection of strings where each string n, contains all the ciphertext characters that would have been encrypted with the n'th chararcter of the key.

That means that if I call my function transpose(ciphertext, keyLen) with arguments transpose("123456789", 3), then my output would be:

[['1', '4', '7'], ['2', '5', '8'], ['3', '6', '9']]

Problem

Ok, the transpose function I have made looks like this:

def transpose(string, n):
    buckets = [[] for i in range(n)]
    i = 0
    for c in string:
        buckets[i].append( c)
        i += 1
        if i > (n-1):
            i = 0
    return buckets

When I use it on a string, it works just as expected, and outputs the expected output for "123456789". But When I pass my 'bytes' ciphertext, then the output looks like this:

[[29, 54, 60, 55, 56, 116, 58, 53, 116, 38, 59, 116, 94, 58, 55, 49, 57, 57, 59, 61, 53, 34, 53, 58, 59, 116, 36, 32, 101, 48, 60, 51, 58, 116, 53, 58, 115, 116, 116, 49, 33, 13, 116, 60, 54, 59, 122, 59, 44, 53, 53, 116, 49, 50, 116, 45, 51, 49, 45, 59, 53, 38, 116, 94, 60, 48, 94, 59, 116, 49, 116, 54, 55, 116, 116, 58, 115, 33, 49, 59, 53, 58, 32, 49, 38, 53, 53, 45, 59, 116, 45, 61, 59, 94, 54, 32, 48, 55, 120, 39], [66, 0, 12, 22, 69, 4, 1, 11, 11, 16, 16, 12, 40, 66, 4, 69, 11, 0, 9, 11, 22, 0, 11, 66, 11, 10, 9, 69, 72, 69, 28, 12, 69, 111, 14, 69, 22, 13, 17, 23, 17, 10, 11, 69, 9, 11, 69, 8, 12, 69, 11, 44, 69, 23, 34, 12, 13, 23, 69, 69, 23, 10, 8, 54, 17, 69, 60, 18, 12, 4, 60, 10, 4, 4, 9, 69, 69, 23, 73, 8, 23, 28, 69, 69, 28, 28, 28, 69, 69, 111, 69, 0, 8, 53, 10, 13, 0, 73, 69, 12], [31, 30, 30, 6, 6, 30, 82, 27, 29, 21, 6, 6, 11, 94, 7, 120, 94, 82, 82, 85, 23, 82, 82, 82, 23, 20, 19, 7, 64, 120, 31, 30, 23, 59, 23, 95, 82, 19, 29, 94, 82, 7, 29, 31, 11, 83, 36, 27, 17, 5, 22, 17, 19, 23, 30, 28, 82, 23, 6, 26, 6, 7, 27, 2, 82, 31, 29, 28, 28, 0, 61, 22, 28, 28, 11, 17, 19, 23, 82, 23, 23, 6, 6, 22, 16, 82, 94, 21, 5, 62, 6, 92, 23, 30, 11, 19, 0, 82, 49, 17], [77, 1, 8, 12, 5, 1, 25, 1, 25, 77, 5, 77, 77, 77, 30, 44, 77, 103, 25, 77, 77, 0, 9, 29, 77, 11, 20, 29, 64, 43 (...)

And now my bytes have been converted to their integer representations? This does'nt really make sense to me, since all I am doing is to iterate through the bytes and place them in buckets. Why is it that these bytes are turned into integers if all you do is iterate over them?

Epsi95
  • 8,832
  • 1
  • 16
  • 34
  • 1
    ``bytes`` *are* sequences of 1 byte *integers*, not of "individual bytes". – MisterMiyagi Aug 26 '21 at 14:11
  • [PEP 358](https://www.python.org/dev/peps/pep-0358/) and [PEP 3137](https://www.python.org/dev/peps/pep-3137/) both say *that* ``bytes`` are sequences of integers, but do not give a rationale for it. – MisterMiyagi Aug 26 '21 at 14:15

1 Answers1

0

Ah, I remember this problem in cryptopals, I was facepalming when I understood how this trick works.

As MisterMiyagi said it, bytes is not a sequence of "bytes", but a sequence of ints.

If you index into an str, you get another str:

>>> type("abc"[0])
<class 'str'>

But with bytes:

>>> type(b"abc"[0])
<class 'int'>

So it's your for loop that 'converts' them to int. This can be directly done by list:

>>> list(b"abc")
[97, 98, 99]

But the reverse is also easily possible:

>>> bytes([97, 98, 99])
b'abc'
Uncle Dino
  • 812
  • 1
  • 7
  • 23