0

I have a dataset as follows

Unique Name Parent Child
US_SQ A A1
UC_LC A A2
UK_SJ A2 A21
UI_QQ B B1

Now I want to set the output as follows:

US_SQ
├── A1
└── UC_LC
    └── UK_SJ
UI_QQ
└── B1

In other words, I want to use the Unique name column value in the tree.

This is the code that I am using:

def add_nodes(nodes, parent, child):
    if parent not in nodes:
        nodes[parent] = Node(parent)  
    if child not in nodes:
        nodes[child] = Node(child)
    nodes[child].parent = nodes[parent]

data = pd.DataFrame(columns=["Parent","Child"], data=[["US_SQ","A","A1"],["UC_LC","A","A2"],["UK_SJ","A2","A21"],["UI_QQ","B","B1"]])
nodes = {}  # store references to created nodes 
# data.apply(lambda x: add_nodes(nodes, x["Parent"], x["Child"]), axis=1)  # 1-liner
for parent, child in zip(data["Parent"],data["Child"]):
    add_nodes(nodes, parent, child)

roots = list(data[~data["Parent"].isin(data["Child"])]["Parent"].unique())
for root in roots:         # you can skip this for roots[0], if there is no forest and just 1 tree
    for pre, _, node in RenderTree(nodes[root]):
        print("%s%s" % (pre, node.name))

Also, is there a way to access the tree data efficiently/ is there any format to save the tree data so that we can easily find the parent/child node easily?

The above data and problem is used from here:

Read data from a pandas DataFrame and create a tree using anytree in python
  • hey @AbdullahAlMamun it's not clear how we get the output from the given dataset, please help to elaborate on that? If we know how the parent and child are decided we can surely come up with something. Is `Unique Name` an alias, if so is it an alias for parent or child? – รยקคгรђשค Nov 30 '22 at 06:19
  • Hey @รยקคгรђשค `Unique Name` is an alias for each Parent on each row. The parent and child are decided usual from the parent, child dataset. But what I want to do is replacing parent name on each row to the `Unique Name` value. – Abdullah Al Mamun Nov 30 '22 at 14:49
  • 1
    recommend you to create a `aliasDict` first and then instead of using `add_nodes(nodes, parent, child)` use `add_nodes(nodes, aliasDict[parent], aliasDict[child])`. – รยקคгรђשค Nov 30 '22 at 15:36
  • Can you please tell me where and what to write for `aliasDict` and where to write the `add_nodes(nodes, aliasDict[parent], aliasDict[child])`? Sorry, I am newbie in python. – Abdullah Al Mamun Nov 30 '22 at 16:15

1 Answers1

2

There are two parts to your question.

1. Renaming the Node

Regarding renaming the node by using Unique Name as the alias for Parent name, the above answer on aliasDict is good but we can modify the DataFrame directly instead, leaving your code unchanged.

I have modified your DataFrame because it does not seem to run properly, and your code example does not clearly show that Unique Name is an alias for Parent in some cases.

data = pd.DataFrame(
    columns=["Unique Name", "Parent", "Child"],
    data=[
        ["US_SQ", "A", "A1"],
        ["US_SQ", "A", "A2"],
        ["UC_LC", "A2", "A21"],
        ["UI_QQ", "B", "B1"]
    ]
)

# Rename Parent and Child columns using aliasDict
aliasDict = dict(data[["Parent", "Unique Name"]].values)
data["Parent"] = data["Parent"].replace(aliasDict)
data["Child"] = data["Child"].replace(aliasDict)

# Your original code - unchanged
nodes = {}
for parent, child in zip(data["Parent"],data["Child"]):
    add_nodes(nodes, parent, child)

2. Exporting to DataFrame

In the second part, anyTree does not provide integration with pandas DataFrame. An alternative bigtree Python package does this out-of-the-box for you.

The whole code example can be implemented as such,

import pandas as pd
from bigtree import dataframe_to_tree_by_relation, print_tree, tree_to_dataframe

data = pd.DataFrame(
    columns=["Unique Name", "Parent", "Child"],
    data=[
        ["root", "root", "A"],  # added this line
        ["root", "root", "B"],  # added this line
        ["US_SQ", "A", "A1"],
        ["US_SQ", "A", "A2"],
        ["UC_LC", "A2", "A21"],
        ["UI_QQ", "B", "B1"]
    ]
)

# Rename Parent and Child columns using aliasDict (same as above)
aliasDict = dict(data[["Parent", "Unique Name"]].values)
data["Parent"] = data["Parent"].replace(aliasDict)
data["Child"] = data["Child"].replace(aliasDict)

# Create a tree from dataframe, print the tree
root = dataframe_to_tree_by_relation(data, parent_col="Parent", child_col="Child")
print_tree(root)
# root
# ├── US_SQ
# │   ├── A1
# │   └── UC_LC
# │       └── A21
# └── UI_QQ
#     └── B1

# Export tree to dataframe
tree_to_dataframe(root, parent_col="Parent", name_col="Child")
#                     path  Child Parent
# 0                  /root   root   None
# 1            /root/US_SQ  US_SQ   root
# 2         /root/US_SQ/A1     A1  US_SQ
# 3      /root/US_SQ/UC_LC  UC_LC  US_SQ
# 4  /root/US_SQ/UC_LC/A21    A21  UC_LC
# 5            /root/UI_QQ  UI_QQ   root
# 6         /root/UI_QQ/B1     B1  UI_QQ

Source: I'm the creator of bigtree ;)

Kay Jan
  • 316
  • 1
  • 6
  • Great! Seems it will work for me too. I will try it and update you soon. – Abdullah Al Mamun Dec 02 '22 at 17:17
  • Hey @Kay Jan I was trying to apply this code to my main dataset where I have thousands of rows. In the dataset I have some duplicate child with different parents. And the code is giving me the following error: `ValueError: There exists duplicate child with different parent where the child is also a parent node. Duplicated node names should not happen, but can only exist in leaf nodes to avoid confusion. ` Also there is another `cant find parent mode` issue. Do you know how to fix these issues? – Abdullah Al Mamun Dec 04 '22 at 15:25
  • 1
    Hello, duplicate child nodes with different parents is okay ONLY if the child nodes are leaf nodes. If not, there would be confusion. For example if A has parent B, and A had parent C (this is okay), but if D has parent A (is this referring to B/A or C/A?) in this case it is ambiguous so the code checks for and prohibits such cases. – Kay Jan Dec 06 '22 at 11:42
  • 1
    A workaround I can think of is to grow the tree until the duplicated nodes, and extend the tree using paths. Following the example above, we can add path “C/A/D” for D with parent A so this removes the ambiguity on whether to add D to B/A or C/A – Kay Jan Dec 06 '22 at 11:45