I have tree-like data that is constructed of Parent codes that contain Child codes which may act as parents themselves, depending on if they're marked as "SA". This data is present in an Excel sheet and looks like follows:
| Tree Level (A) | Code (B) | Spec (C) | Comm. Code (D) | Parent Code (J) |
|----------------|----------|----------|----------------|-----------------|
| 1 | A12 | 1 | SA | Mach |
| 2 | B41 | 2 | SA | A12 |
| 3 | A523 | 1 | BP | B41 |
| 2 | G32 | 4 | BP | A12 |
| 2 | D3F5 | 1 | SA | A12 |
| 3 | A12 | 4 | SA | D3F5 |
| 3 | A12 | 1 | SA | D3F5 |
There is one issue here: A12, at the top tree level (1), contains a child (D3F5), which itself contains another parent that's the same as D3F5's own parent. As you may be able to imagine, this (although not represented in the data as it is delivered to me) creates an endless loop, where A12 at tree level 3 unfolds the entire structure again and again.
Note that one of the two 'A12' children poses no problem, as this has a different specification as to the A12 parent at tree level 1.
I have a function that checks for this situation, but it is extremely slow as it uses nested loops to go through the rows, and the total row count can be several 1000s. The end goal is to show the user the deepest level at which the error occurs. In this example, that would be code A12
with spec 1
at tree level 3
:
def nested_parent(sht):
"""
Checks if a parent SA contains itself as a child.
:return: nested_parents: Dictionary of found 'nested parents'. None if none found
"""
nested_parents = {}
found = False
lrow = sht.Cells(sht.Rows.Count, 1).End(3).Row
parent_treelevel = 1
# Get deepest tree level, as this no longer contains children
last_treelevel = int(max([i[0] for i in sht.Range(sht.Cells(2, 1), sht.Cells(lrow, 1)).Value]))
# Loop through parent rows
print('Checking for nested parents...')
for i in range(2, lrow):
if sht.Cells(i, "D").Value == "SA":
parent_code, parent_treelevel = f'{sht.Cells(i, "B").Value}_{sht.Cells(i, "C")}', sht.Cells(i, "A").Value
# Add new key with list containing parent's tree level for parent code
if parent_code not in nested_parents:
nested_parents[parent_code] = [int(parent_treelevel)]
# Loop child rows
for j in range(i + 1, lrow + 1):
child_code, child_treelevel = f'{sht.Cells(j, "B").Value}_{sht.Cells(j, "C")}', sht.Cells(i, "A").Value
if child_code == parent_code and child_treelevel > parent_treelevel:
found = True
nested_parents[parent_code].append(int(child_treelevel))
if parent_treelevel == last_treelevel:
# End function if deepst tree level is reached
print("done")
if found:
# Delete keys that contain no information
delkeys = []
for key in reversed(nested_parents):
if len(nested_parents[key]) == 1:
delkeys.append(key)
for key in delkeys:
del nested_parents[key]
return nested_parents
else:
return
This function can be called as follows, where wb_name
is the name of the workbook that contains the data:
from win32com.client import GetObject
wb_name = "NAME"
sht = GetObject(None, "Excel.Application").Workbooks(wb_name).Worksheets(1)
def err(msg):
"""
stops the code from executing after printing an error message
"""
print("Unexpected error occured:", msg)
exit()
infloop = nested_parent(sht)
if infloop is not None:
dict_str = ''.join([f'Code: {key}, Tree levels: {infloop[key]}\n' for key in infloop])
err(f"Warning: one or more parent codes contain their own code as a child:\n{dict_str}")
I am hoping to speed up this code, as the rest of my script is fairly quick and its speed is being seriously hampered by this function.