0

WARNING: This is still Python 2.7!

For some reason the following function has extremely slow runtime:

def get_content(d,content):
    if isinstance(d,unicode):
        content += d
        content += u'\n'
    elif isinstance(d,dict):
        for key in d:
            content += get_content(d[key],content)
    elif isinstance(d,list):
        for el in d:
            content += get_content(el,content)
    return content

content = get_content(D,u'')

That is although D is quite small. Nothing crazy going on there size-wise.

Do you see what the problem is or what would be a better way to solve the task?

EDIT: I changed the code to ...

def get_content(d,content):
    if isinstance(d,unicode):
        content += [d]
    elif isinstance(d,dict):
        for key in d:
            content += get_content(d[key],content)
    elif isinstance(d,list):
        for el in d:
            content += get_content(el,content)
    return content

content = get_content(D,[])

... and it still has the same problem.

Radio Controlled
  • 825
  • 8
  • 23
  • How many hours does it run on your machine? – Thomas Weller Jul 14 '20 at 08:51
  • No, it's getting slower and slower something must be wrong with the recursion or there is some side-effect of the unicode that I am not aware of. The input XML is only 150KB large and then I am using xmltodict to make it to a dictionary and now I was trying to get the string contents. It's really small. – Radio Controlled Jul 14 '20 at 08:53
  • 1
    there is no base case in your recursion – bigbounty Jul 14 '20 at 08:54
  • 1
    Repeated string concatenation is not recommended. Try appending your data to a list and using `'\n'.join(your_list)` at the end. Currenlty each time you concatenate strings the whole string has to be recreated in memory. See https://stackoverflow.com/questions/3055477/how-slow-is-pythons-string-concatenation-vs-str-join – ScootCork Jul 14 '20 at 08:56
  • @bigbounty What is a base case? – Radio Controlled Jul 14 '20 at 09:03
  • @bigbounty: `return content` is the base case when all `if` and `elif` don't match or when `isinstance(d,unicode)` – Thomas Weller Jul 14 '20 at 09:05
  • 1
    @RadioControlled: a case where the recursion ends, i.e. `get_content()` is not called any more. – Thomas Weller Jul 14 '20 at 09:07
  • That's a good point though, maybe the problem has something to do with that. I am not that unexperienced in python and I am really puzzled at the moment ;) – Radio Controlled Jul 14 '20 at 09:10
  • 1
    Shouldn't you be passing an empty list for all `get_content` calls? e.g. only append the 'new' stuff – ScootCork Jul 14 '20 at 09:13
  • @ScootCork You solved the mystery! Is this worth writing an answer for? – Radio Controlled Jul 14 '20 at 09:15

1 Answers1

2

The problem is that you are reappending the whole content at each recursion. To solve pass an empty list to each get_content call instead.

def get_content(d,content):
    if isinstance(d,unicode):
        content += [d]
    elif isinstance(d,dict):
        for key in d:
            content += get_content(d[key],[])
    elif isinstance(d,list):
        for el in d:
            content += get_content(el,[])
    return content

content = get_content(D,[])
ScootCork
  • 3,411
  • 12
  • 22