I am trying to use the Python AnyTree module to map URL redirections into a tree without creating any duplicate nodes.
I've tried to play around with the code using AnyTree docs and similar questions e.g. Tree with no duplicate children
My current code is:
from anytree import Node, RenderTree
root_nodes = []
for url in redirections:
is_root = True
for redir in redirections:
if url['url'] == redir['redir']:
is_root = False
if is_root:
root = Node(url['url'])
root_nodes.append(root)
for root in root_nodes:
for redir in redirections:
if redir['url'] == root.name:
sub = Node(redir['redir'], parent=root)
else:
sub = next((c for c in root.children if c.name == redir['url']), None)
if sub is None:
sub = Node(redir['redir'], parent=root)
else:
new_node = sub
sub = Node(redir['redir'], parent=new_node)
Basically, given a list of redirections like:
redirections = [
{
'url': "alpha.com",
'redir_url': "beta.com",
},
{
'url': "alpha.com",
'redir_url': "charlie.com",
},
{
'url': "beta.com",
'redir_url': "charlie.com",
},
{
'url': "beta.com",
'redir_url': "delta.com",
},
{
'url': "delta.com",
'redir_url': "foxtrot.com",
},
{
'url': "foxtrot.com",
'redir_url': "golf.com",
},
{
'url': "india.com",
'redir_url': "charlie.com",
},
{
'url': "india.com",
'redir_url': "juliet.com",
},
]
I want AnyTree to produce an output like:
alpha.com -> beta.com -> charlie.com
-> delta.com -> foxtrot.com -> golf.com
-> charlie.com
india.com -> charlie.com
-> juliet.com
Instead, it currently prints:
alpha.com
├── beta.com
│ ├── charlie.com
│ └── delta.com
├── charlie.com
├── foxtrot.com
│ └── golf.com
├── charlie.com
└── juliet.com
india.com
├── beta.com
│ ├── charlie.com
│ └── delta.com
├── charlie.com
├── foxtrot.com
│ └── golf.com
├── charlie.com
└── juliet.com
As you can see, there are lots of duplicates. Also, foxtrot and golf aren't added to the delta chain. Finally, india has man redirections that do not occur from those URLs.
Note that the redirections array could be in any order (not necessarily the order the redirections occurred in)