1

What would be a more efficient approach to split a KLV string into lists/tuples of key, length, value as elements?

To add a little background, the first 3 digit make a key, next 2 indicates the length of value.
I have been able to solve the problem with the following code. But I don't think my code and logic is the most efficient way to do the task. Therefore, I would love to hear other opinions so that I can get better.

result = []

def klv_split(ss):
    while True:
        group1 = ss[:3]
        group2 = ss[3:5]
        print(group2)
        group3 = ss[5 : 5 + int(group2)]
        result.append([group1, group2, group3])
        try:
            klv_split(ss[5 + int(group2) :])
        except ValueError:
            break
        break

    return result


klv_string = "0021571583400000026400412000000000200026047299049000850025003ADV25110Blahbleble25304677225400255002560204"
klv_split(klv_string)
print(result)

The expected output is a list of small ones with key-length-value as below.

[['002', '15', '715834000000264'], ['004', '12', '000000000200'], ['026', '04', '7299'], ['049', '00', ''], ['085', '00', ''], ['250', '03', 'ADV'], [
'251', '10', 'Blahbleble'], ['253', '04', '6772'], ['254', '00', ''], ['255', '00', ''], ['256', '02', '04']]
Anh D Vu
  • 13
  • 3
  • Perhaps you could use regex for this, see [here](https://stackoverflow.com/questions/14814186/python-splitting-by-certain-pattern) –  May 06 '19 at 09:23

3 Answers3

1

Other answers created an iterative version of your recursive function. It will be faster since Python does not optimize tail call recursion.

I will focus on the case where you have a huge binary file to parse:

>>> def klvs(f):
...     while True:
...         k = f.read(3)
...         if not k:
...             return
...
...         k_length = f.read(2)
...         assert len(k_length) == 2
...         k_length = int(k_length)
...         value = f.read(k_length)
...         assert len(value) == k_length
...         yield (k, k_length, value)
...

It's more convenient to create an iterator (although it may not be faster). I used bytes since that's you usually get for klv data:

>>> klv_bytes = b"0021571583400000026400412000000000200026047299049000850025003ADV25110Blahbleble25304677225400255002560204"
>>> import io
>>> f = io.BytesIO(klv_bytes)
>>> list(klvs(f))
[(b'002', 15, b'715834000000264'), (b'004', 12, b'000000000200'), (b'026', 4, b'7299'), (b'049', 0, b''), (b'085', 0, b''), (b'250', 3, b'ADV'), (b'251', 10, b'Blahbleble'), (b'253', 4, b'6772'), (b'254', 0, b''), (b'255', 0, b''), (b'256', 2, b'04')]

You might want to get an element by key or by index without creating all the tuples:

>>> import os
>>> def get(f, to_search):
...     i = 0
...     while True:
...         k = f.read(3)
...         if not k:
...             return None
...
...         k_length = f.read(2)
...         assert len(k_length) == 2
...         k_length = int(k_length)
...         if to_search(i, k):
...             value = f.read(k_length)
...             assert len(value) == k_length
...             return (k, k_length, value)
...         else:
...             f.seek(k_length, os.SEEK_CUR)
...         i += 1
...
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda _, k: k==b"004")
(b'004', 12, b'000000000200')
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda _, k: k=="foo") is None
True
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda i, _: i==10)
(b'256', 2, b'04')
>>> f = io.BytesIO(klv_bytes)
>>> get(f, lambda i, _: i==11) is None
True

Note that the get function are O(n) and the creation of a list or dict will be faster if you look for several elements.

jferard
  • 7,835
  • 2
  • 22
  • 35
0

Use the information from size to do it:

def klv_split(ss):
    result = []
    while len(ss) != 0:
        group1 = ss[:3]
        group2 = ss[3:5]
        up_to = 5 + int(group2)
        group3 = ss[5:up_to]
        result.append((group1, group2, group3))
        ss = ss[up_to:]
    return result

Result:

[('002', '15', '715834000000264'), ('004', '12', '000000000200'), ('026', '04', '7299'), ('049', '00', ''), ('085', '00', ''), ('250', '03', 'ADV'), ('251', '10', 'Blahbleble'), ('253', '04', '6772'), ('254', '00', ''), ('255', '00', ''), ('256', '02', '04')]

Here you have the live example

Netwave
  • 40,134
  • 6
  • 50
  • 93
0

Instead of a while True loop, you can use an index for your while loop.

klv_string = "0021571583400000026400412000000000200026047299049000850025003ADV25110Blahbleble25304677225400255002560204"

def klv_split(ss):
    idx = 0
    result = []
    #Run till index is less than length of string
    while idx < len(ss):
        #Extract various groups using indexes
        group1 = ss[idx:idx+3]
        group2 = ss[idx+3:idx+5]
        group3 = ss[idx+5:idx+5 + int(group2)]
        result.append([group1, group2, group3])

        #Increment the index
        idx += 5+int(group2)
    return result

print(klv_split(klv_string))

The output will be

[['002', '15', '715834000000264'], 
['004', '12', '000000000200'], 
['026', '04', '7299'], 
'049', '00', ''], 
['085', '00', ''], 
['250', '03', 'ADV'], 
['251', '10', 'Blahbleble'], 
['253', '04', '6772'], 
['254', '00', ''], 
['255', '00', ''], 
['256', '02', '04']]
Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40
  • 1
    Your code helps me think better! I appreciate it :) – Anh D Vu May 06 '19 at 09:53
  • Great! Glad to help! If my answer, helped you, please accept it by clicking the tick next to it if it looks good to you :) @AnhDVu Also take a look at: stackoverflow.com/help/someone-answers – Devesh Kumar Singh May 06 '19 at 10:19