1

I have this list of hierarchical URLs:

data = ["https://python-rq.org/","https://python-rq.org/a","https://python-rq.org/a/b","https://python-rq.org/c"]

And I want to dynamically make a nested dictionary for every URL for which there exists another URL that is a subdomain/subfolder of it.

I already tried the follwoing but it is not returning what I expect:

result = []
for key,d in enumerate(data):
    form_dict = {}
    r_pattern = re.search(r"(http(s)?://(.*?)/)(.*)",d)
    r = r_pattern.group(4)
    if r == "":
        parent_url = r_pattern.group(3)
    else:
        parent_url = r_pattern.group(3) + "/"+r
    print(parent_url)
    temp_list = data.copy()
    temp_list.pop(key)
    form_dict["name"] = parent_url
    form_dict["children"] = []
    for t in temp_list:
        child_dict = {} 
        if parent_url in t:
            child_dict["name"] = t
            form_dict["children"].append(child_dict.copy())
    result.append(form_dict)

This is the expected output.

{
   "name":"https://python-rq.org/",
   "children":[
      {
         "name":"https://python-rq.org/a",
         "children":[
            {
               "name":"https://python-rq.org/a/b",
               "children":[

               ]
            }
         ]
      },
      {
         "name":"https://python-rq.org/c",
         "children":[

         ]
      }
   ]
}

Any advice?

Akaisteph7
  • 5,034
  • 2
  • 20
  • 43
  • It sounds like what you want is a trie. There is trie implementations in python as well. See https://stackoverflow.com/questions/11015320/how-to-create-a-trie-in-python – ypnos Jul 09 '19 at 14:31

1 Answers1

0

This was a nice problem. I tried going on with your regex method but got stuck and found out that split was actually appropriate for this case. The following works:

data = ["https://python-rq.org/","https://python-rq.org/a","https://python-rq.org/a/b","https://python-rq.org/c"]
temp_list = data.copy()
# This removes the last "/" if any URL ends with one. It makes it a lot easier 
# to match the URLs and is not necessary to have a correct link.
data = [x[:-1] if x[-1]=="/" else x for x in data]
print(data)

result = []

# To find a matching parent
def find_match(d, res):
    for t in res:
        if d == t["name"]:
            return t
        elif ( len(t["children"])>0 ):
            temp = find_match(d, t["children"])
            if (temp):
                return temp 
    return None

while len(data) > 0:
    d = data[0]
    form_dict = {}
    l = d.split("/")
    # I removed regex as matching the last parentheses wasn't working out 
    # split does just what you need however
    parent = "/".join(l[:-1])
    data.pop(0)
    form_dict["name"] = d
    form_dict["children"] = []
    option = find_match(parent, result)
    if (option):
        option["children"].append(form_dict)
    else:
        result.append(form_dict)

print(result)
[{'name': 'https://python-rq.org', 'children': [{'name': 'https://python-rq.org/a', 'children': [{'name': 'https://python-rq.org/a/b', 'children': []}]}, {'name': 'https://python-rq.org/c', 'children': []}]}]
Akaisteph7
  • 5,034
  • 2
  • 20
  • 43