1

I know there are quite a few similar questions out there, Like:

Simple way to convert a string to a dictionary

But, I am trying to do this without regex.

For example, I know I can do this

string = "abc=123,xyz=456"
dict(x.split('=') for x in string.split(','))

To give:

{'xyz': '456', 'abc': '123'}

I am trying to do this for nested dictionaries. And I prefer avoiding regex as much as possible.

An example string I have is:

"{ currNode = {currIndex = 23, currElem = 0x0}, size = 23}"

Code should convert this to

{ 'currNode': { 'currIndex':'23', 'currElem':'0x0'}, 'size':'23' }

Which is a basically a nested python dictionary. The link I've included gives me an empty dict for this kind of example.

XChikuX
  • 766
  • 1
  • 9
  • 33
  • 1
    You could put this in a function, then check if the value given by the regex contains `{}`s. If it does, then recurse by passing the value back into the function. This problem screams recursion. There's likely already a built in parser that can solve this though. – Carcigenicate Jul 31 '18 at 21:37
  • This is basically how 'gdb' prints its data. Would you happen to know a built in parser for such a thing? – XChikuX Jul 31 '18 at 21:54
  • No, honestly I don't use Python regularly. I only commented because you weren't getting other help. Sorry. – Carcigenicate Jul 31 '18 at 21:55
  • 1
    Ah, its alright. In the works of writing a recursive function. I'll probably put that up as an answer myself. – XChikuX Jul 31 '18 at 21:58

1 Answers1

1

There is a JSON library, which already provides the basic functionality of loading strings to dictionaries. It seems like it would be fairly simple to have a string formatting function that converts input string to JSON, and then load it using the library function. Given that, this should work?

import json
import string
from pprint import pprint


def convert(input_string):
    """ Given an input string, convert to JSON and load to dict"""

    token_characters = string.ascii_letters + string.digits
    json_str = str()

    token_marker = False
    for index, char in enumerate(input_string):
        if char == "=":
            json_str += ":"
        elif char in token_characters and not token_marker:
            token_marker = True
            json_str += '"%s' % char
        elif char not in token_characters and token_marker:
            token_marker = False
            json_str += '"%s' % char
        else:
            json_str += char

    return json.loads(json_str)


if __name__ == "__main__":
    a = "{ currNode = {currIndex = 23, currElem = 0x0}, size = 23}"
    pprint(convert(a))

This basically just parses the string, looks out for characters that could be keys or values (or tokens in the code), and then quotes them to make a JSON compatible string. You have to correctly define your token characters for it to work though.

You could in theory change this to have the reverse logic where you treat everything other than "{,= }" like a token character. The deciding factor would be depending on whether or not you had consistent separators or characters (or which you would have the write the fewest tests for). This latter approach seems like it may be better though, Here is an example of the logic flipped version:

def convert2(input_string):
    """ given an input string, convert to JSON and load"""

    not_token_characters = "{=,: }"
    json_str = str()

    token_marker = False
    for index, char in enumerate(input_string):
        if char == "=":
            json_str += ":"
        elif char not in not_token_characters and not token_marker:
            token_marker = True
            json_str += '"%s' % char
        elif char in not_token_characters and token_marker:
            token_marker = False
            json_str += '"%s' % char
        else:
            json_str += char

    return json.loads(json_str)

To make this really general purpose you'd probably have to add some additional error checking, but given the example this should get you going I hope.

sehafoc
  • 866
  • 6
  • 9