2

I am trying to use the Python AnyTree module to map URL redirections into a tree without creating any duplicate nodes.

I've tried to play around with the code using AnyTree docs and similar questions e.g. Tree with no duplicate children

My current code is:

from anytree import Node, RenderTree

root_nodes = []

for url in redirections:
    is_root = True
    for redir in redirections:
        if url['url'] == redir['redir']:
            is_root = False
    if is_root:
        root = Node(url['url'])
        root_nodes.append(root)


for root in root_nodes:
    for redir in redirections:
        if redir['url'] == root.name:
            sub = Node(redir['redir'], parent=root)
        else:
            sub = next((c for c in root.children if c.name == redir['url']), None)
            if sub is None:
                sub = Node(redir['redir'], parent=root)
            else:
                new_node = sub
                sub = Node(redir['redir'], parent=new_node)

Basically, given a list of redirections like:

redirections = [
    {
        'url': "alpha.com",
        'redir_url': "beta.com",
    },
    {
        'url': "alpha.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "beta.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "beta.com",
        'redir_url': "delta.com",
    },
    {
        'url': "delta.com",
        'redir_url': "foxtrot.com",
    },
    {
        'url': "foxtrot.com",
        'redir_url': "golf.com",
    },
    {
        'url': "india.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "india.com",
        'redir_url': "juliet.com",
    },
]

I want AnyTree to produce an output like:

alpha.com -> beta.com -> charlie.com
                      -> delta.com -> foxtrot.com -> golf.com
          -> charlie.com

india.com -> charlie.com
          -> juliet.com

Instead, it currently prints:

alpha.com
├── beta.com
│   ├── charlie.com
│   └── delta.com
├── charlie.com
├── foxtrot.com
│   └── golf.com
├── charlie.com
└── juliet.com
india.com
├── beta.com
│   ├── charlie.com
│   └── delta.com
├── charlie.com
├── foxtrot.com
│   └── golf.com
├── charlie.com
└── juliet.com

As you can see, there are lots of duplicates. Also, foxtrot and golf aren't added to the delta chain. Finally, india has man redirections that do not occur from those URLs.

Note that the redirections array could be in any order (not necessarily the order the redirections occurred in)

CryptoCat
  • 21
  • 4

1 Answers1

0

You need a container who knows all the nodes and links them properly.

from anytree import Node, RenderTree

redirections = [
    {
        'url': "alpha.com",
        'redir_url': "beta.com",
    },
    {
        'url': "alpha.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "beta.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "beta.com",
        'redir_url': "delta.com",
    },
    {
        'url': "delta.com",
        'redir_url': "foxtrot.com",
    },
    {
        'url': "foxtrot.com",
        'redir_url': "golf.com",
    },
    {
        'url': "india.com",
        'redir_url': "charlie.com",
    },
    {
        'url': "india.com",
        'redir_url': "juliet.com",
    },
]


class Fab:

    def __init__(self):
        self.nodemap = {}

    @property
    def roots(self):
        return [node for node in self.nodemap.values() if node.is_root]

    def create(self, name=None, parentname=None):
        node = self._create(name)
        if parentname is not None:
            self._create(parentname).parent = node

    def _create(self, name):
        nodemap = self.nodemap
        if name not in nodemap:
            node = nodemap[name] = Node(name)
        else:
            node = nodemap[name]
        return node


f = Fab()
for redirect in redirections:
    url = redirect['url']
    redir_url = redirect['redir_url']
    f.create(url, redir_url)

for root in f.roots:
    for pre, fill, node in RenderTree(root):
        print("%s%s" % (pre, node.name))

This will give you

alpha.com
└── beta.com
    └── delta.com
        └── foxtrot.com
            └── golf.com
india.com
├── charlie.com
└── juliet.com

I will add a generic node fab solving this issue: https://github.com/c0fec0de/anytree/issues/122

c0fec0de
  • 651
  • 8
  • 4