Why does my binary tree behave differently given different byte strings?

Question

I have been practicing recursion with python and currently am attempting to stop recursing all the way down to single bytes and instead stop at a certain byte size. In this example I choose 2, so in my code if a either of the potential children to be spawned is less than 2, it won't recurse and will just return the current node. It works fine with the first byte string, but fails with the next two. Why is this happening and how can I fix it?

Correct output for 1st b: stops recursing/creating children at size 3, because next generation of children have at least 1 child smaller than size 2

b'\x00\x01\x00\x02\x00\x03'
b'\x00\x01\x00'
b'\x02\x00\x03'

Incorrect output for 2nd b: Appears to be recursing until single bytes

b'L_]ju\x87\xd4\x14j\x1b> \xc52'
b'L_]ju\x87\xd4'
b'L_]'
b'ju\x87\xd4'
b'ju'
b'\x87\xd4'
b'\x14j\x1b> \xc52'
b'\x14j\x1b'
b'> \xc52'
b'> '
b'\xc52'

from random import randbytes


class Node:
    def __init__(self, value):
        self.value = value
        self.children = []
        self.parent = None
        self.bytesize = len(value)
  
    def make_children(self, child):
        child.parent = self
        self.children.append(child)

    def print_tree(self):
        print(self.value)
        if len(self.children) > 0: # leaf node case
            for child in self.children:
                child.print_tree()

def build_tree(value_list):
    root = Node(value_list)
    #if len(value_list) == 1:
    if len(value_list) / 2 < 2: # MODIFY TO STOP RECURSING IF SIZE OF CHILDREN WILL BE BELOW 2
        return root
        
    mid_point = len(value_list) // 2
    left_half = value_list[:mid_point]
    right_half = value_list[mid_point:]

    child1 = build_tree(left_half)
    root.make_children(child1)

    child2 = build_tree(right_half)
    root.make_children(child2)

    return root


if __name__ == '__main__':
    #list1 = [12, 7, 8, 15, 9]
    b = b'\x00\x01\x00\x02\x00\x03'
    #b = b'\x4c\x5f\x5d\x6a\x75\x87\xd4\x14\x6a\x1b\x3e\x20\xc5\x32'
    #b = randbytes(6)
    file = build_tree(b)
    file.print_tree()
    print(len(b))

Quick note: The output for 2b is also a bit unreadble, how could I keep it in the regular byte format without the output being converted to the various characters shown above? — Volapiik Vyrient, Apr 27 '22 at 08:42
Hi. I ran your code and got output: `"b'\x00\x01\x00\x02\x00\x03' b'\x00\x01\x00' b'\x02\x00\x03' 6"`. Is that not what you expected? — Stef, Apr 27 '22 at 12:56
Stef, that is indeed what was expected. What about for b'\x4c\x5f\x5d\x6a\x75\x87\xd4\x14\x6a\x1b\x3e\x20\xc5\x32'? Also trying using randbytes(6) and you will see there are instances when it prints 1 bytes. — Volapiik Vyrient, Apr 27 '22 at 13:48
I get `b'L_]ju\x87\xd4\x14j\x1b> \xc52' b'L_]ju\x87\xd4' b'L_]' b'ju\x87\xd4' b'ju' b'\x87\xd4' b'\x14j\x1b> \xc52' b'\x14j\x1b' b'> \xc52' b'> ' b'\xc52' 14 ` — Stef, Apr 27 '22 at 13:49
Yeah so aren't b'> ' or b'\xc52' single bytes? I do want to ask about the difference in formatting. b'> ' doesn't appear to have a slash, so could this be representative of more than a byte? — Volapiik Vyrient, Apr 27 '22 at 13:50
Note that `len(value_list) / 2 < 2` should probably be `len(value_list) // 2 < 2` or even better, `len(value_list) < 4`. The single-slash symbol `/` is floating-point division in python, not integer division. — Stef, Apr 27 '22 at 14:05
I get a return that appears to be what you want if I define a simple tree function like this: `def mytree(s): ... if len(s) < 4: ... return s ... else: ... m = len(s) // 2 ... return (mytree(s[:m]), mytree(s[m:])) ` — Stef, Apr 27 '22 at 14:10
Good catch, forgot to add that in. I just ran with that being fixed and still get same output. — Volapiik Vyrient, Apr 27 '22 at 14:10
`> ` also has length 2: first a chevron `'>'`, then a space `' '` — Stef, Apr 27 '22 at 14:15
I am currently adopting your code and attempting to run, will update on how it goes. — Volapiik Vyrient, Apr 27 '22 at 14:19
Unfortunately it looks like my code has the exact same logic as yours ;) It's shorted because I didn't bother with the `Node` class, but it should split the string at the same places. — Stef, Apr 27 '22 at 14:20
So if b'> ' is 2 bytes, then is my code actually working as intended? Is there anyway to keep the output in hex form, so it can be a bit more obvious? — Volapiik Vyrient, Apr 27 '22 at 14:20
See [What's the correct way to convert bytes to a hex string in Python 3?](https://stackoverflow.com/questions/6624453/whats-the-correct-way-to-convert-bytes-to-a-hex-string-in-python-3) — Stef, Apr 27 '22 at 14:21

Stef · Accepted Answer · 2022-04-28T07:42:28.513

1

Your code is actually working as intended. The two byte strings you mention both have 2 bytes, not 1.

Here is one way to display a bytestring that might make it more clear:

def print_string(s):
    print(' '.join(map('{:#2x}'.format, s)))

print_string(b'> ')
# 0x3e 0x20

print_string(b'\xc52')
# 0xc5 0x32

edited Apr 28 '22 at 07:42

answered Apr 27 '22 at 15:59

Stef

13,242
2
17
28

Why does my binary tree behave differently given different byte strings?

1 Answers1