1

I can't seem to figure out how this would work in excel VBA

I have this relational data in excel like so:

enter image description here

Hierarchially / treeview, data looks like this:

enter image description here

The End result of the data should look like this (After using excel VBA script) where

  • columns A and B is relational Data
  • column E is input values for lookup
  • Column F is result ancestor value

enter image description here

My script thus far looks like this:

Public Sub DictionaryExamples()

    Dim sht As Worksheet: Set sht = ActiveSheet

    Dim exampleValues As Variant
    Dim i As Long
    Dim aKey As String
    Dim aValue As String
    Dim exampleDict As Object

    'Load values into a variant array
    exampleValues = Range("A1:B15").Value

    'Instantiate a dictionary
    Set exampleDict = CreateObject("scripting.dictionary")

    'Read all keys and values, and add them to the dictionary

    For i = 1 To UBound(exampleValues)
        aKey = CStr(exampleValues(i, 1))
        aValue = CStr(exampleValues(i, 2))
        exampleDict.Add aKey, aValue
    Next i

    'After Dictionary setup, use input values E to output Ancestor F
    Dim curCell As Range
    Dim LastRow As Long
    Dim temp As Variant

    LastRow = sht.Cells(Rows.Count, "E").End(xlUp).row

    'Loop through all values in parent to find ancestor
    For Each curCell In sht.Range("E1:E" & LastRow).Cells
        temp = curCell

        'Search Dictionary until no matches are found, that is ancestor
        Do
            If exampleDict.Exists(temp) Then
                temp = exampleDict(temp)
            Else
                'Print ancestor
                curCell(, 2).Value = temp
                Exit Do
            End If
        Loop

    Next



End Sub

Result so far: (not getting correct output values)

enter image description here

Essentially I am using a dictionary (A= key, B=value) to use as a lookup for (E=input) and then output (F=results)

I loop through multiple times until I find a key that has no pair, and use the latest working key value as the ancestor

The "root" words in the data is unncessary I just put it there for clarification, it could a null value I just wanted to clarify which input levels are already top level ancestor values

Vincent Tang
  • 3,758
  • 6
  • 45
  • 63
  • what's your recommendation on how to go about solving this problem? Besides using dictionaries. Maybe a Vlookup wrapped in a loop? Or just resort to python? – Vincent Tang Aug 17 '17 at 17:24
  • 1
    I guess that I was mistaken about this not being a hierarchy. I suppose that it is a 2 dimensional hierarchy. In any case, I am about to post the corrected code. –  Aug 17 '17 at 17:48

3 Answers3

1

You need to test whether your child node's parent is a root element or is a leaf (child node) itself before continuing your loop. Otherwise, you will always be writing the value of the parent node, which is "Root", and never the parent's name (key).

enter image description here Option Explicit

Public Sub DictionaryExamples()

    Dim sht As Worksheet: Set sht = ActiveSheet
    Dim exampleValues As Variant
    Dim i As Long
    Dim aKey As String, aValue As String
    Dim exampleDict As Object
    Dim curCell As Range

    'Load values into a variant array
    exampleValues = Range("A2:B15").Value

    'Instantiate a dictionary
    Set exampleDict = CreateObject("scripting.dictionary")

    'Read all keys and values, and add them to the dictionary

    For i = 1 To UBound(exampleValues)
        aKey = CStr(exampleValues(i, 1))
        aValue = CStr(exampleValues(i, 2))
        exampleDict.Add aKey, aValue
    Next i

    'After Dictionary setup, use input values E to output Ancestor F


    With sht
        'Loop through all values in parent to find ancestor
        For Each curCell In .Range("E2", .Cells(Rows.Count, "E").End(xlUp))
            aKey = curCell
            'If the
            If Not exampleDict.Exists(exampleDict(aKey)) Then
                'If the node is a parent node print it's value
                'To avoid confusion I'd have used: curCell(, 2).Value = "Parent Node"
                curCell(, 2).Value = exampleDict(aKey)
            Else
                'Search Dictionary until no matches are found, that is ancestor
                Do
                    If exampleDict.Exists(aKey) Then
                        'Here we test if this child node's parent is a root node
                        If Not exampleDict.Exists(exampleDict(aKey)) Then
                            'The child node's parent is a root node
                            curCell(, 2).Value = aKey
                            Exit Do
                        Else
                            'The child node's parent is also a leaf so continue
                            aKey = exampleDict(aKey)
                        End If
                    End If
                Loop
            End If
        Next
    End With

End Sub
  • some reason when i run 1000 input cells with my own relational data it doesn't populate anything in the "F" ancestor field. Dictionary is 100 cells long. Any idea on why this is? – Vincent Tang Aug 17 '17 at 18:22
  • 1
    If you email me the data I'll test it. tommy70458@gmail.com –  Aug 17 '17 at 18:24
0

Another solution (Not my original solution, received help elsewhere)

Option Explicit

Private Const LOOP_LIMIT As Integer = 100

Public Sub LineageDemo()

    Dim dict As Object
    Dim inputValues As Variant
    Dim outputValues As Variant
    Dim i As Long

    'Read relations into dictionary
    Set dict = BuildDictionaryOfRelations(Range("A2:A140"), Range("B2:B140"))

    'Read input values into variant array
    inputValues = Range("E2:E1465").Value

    'Write output
    ReDim outputValues(1 To UBound(inputValues), 1 To 1)
    For i = 1 To UBound(inputValues)
        outputValues(i, 1) = TraceAncestor(CStr(inputValues(i, 1)), dict, "Root")
    Next i

    Range("F2:F1465").Value = outputValues

End Sub

Private Function BuildDictionaryOfRelations(childRange As Range, parentRange As Range) As Object

    Dim childValues As Variant
    Dim parentValues As Variant
    Dim i As Long
    Dim aChild As String
    Dim aParent As String
    Dim dict As Object

    If childRange.Columns.Count <> 1 Or parentRange.Columns.Count <> 1 _
        Or childRange.Rows.Count <> parentRange.Rows.Count Then _
        Err.Raise vbObjectError + 1, Description:="Bad/inconsistent category ranges"

    'Load values into variant arrays
    childValues = childRange.Value
    parentValues = parentRange.Value

    'Instantiate a dictionary
    Set dict = CreateObject("scripting.dictionary")

    'Populate the dictionary

    For i = 1 To UBound(childValues)
        aChild = CStr(childValues(i, 1))
        aParent = CStr(parentValues(i, 1))
        If aChild = "pizza-oven" Then Stop
        dict.Add aChild, aParent
    Next i

    Set BuildDictionaryOfRelations = dict

End Function

Private Function TraceAncestor(aChild As String, relationDict As Object, rootString As String) As String

    Dim aParent As String
    Dim i As Integer

    If Not (relationDict.exists(aChild)) Then
        TraceAncestor = "ERROR: " & aChild & " does not appear in the CategoryName column"
        Exit Function
    End If

    'If aChild is a root, return root
    If relationDict.Item(aChild) = rootString Then
        TraceAncestor = rootString
        Exit Function
    End If

    'Trace from child to parent to parent to parent... to find ultimate ancestor
    For i = 1 To LOOP_LIMIT
        If Not (relationDict.exists(aChild)) Then
            TraceAncestor = "ERROR: " & aChild & " does not appear in the CategoryName column"
            Exit Function
        End If
        aParent = relationDict.Item(aChild)
        If aParent = rootString Then Exit For
        aChild = aParent
    Next i

    If i > LOOP_LIMIT Then
        TraceAncestor = "ERROR: Ancestor could not be found for " & aChild & " in " & LOOP_LIMIT & " iterations"
        Exit Function
    End If

    TraceAncestor = aChild

End Function
Vincent Tang
  • 3,758
  • 6
  • 45
  • 63
0

I only had about 1000 to 2000 cells or so, so I ended up using Jerry's Cascading tree formula here

https://sites.google.com/a/madrocketscientist.com/jerrybeaucaires-excelassistant/text-functions/cascading-tree

Option Explicit

Sub TreeStructure()
'JBeaucaire  3/6/2010, 10/25/2011
'Create a flow tree from a two-column accountability table
Dim LR As Long, NR As Long, i As Long, Rws As Long
Dim TopRng As Range, TopR As Range, cell As Range
Dim wsTree As Worksheet, wsData As Worksheet
Application.ScreenUpdating = False

'Find top level value(s)
Set wsData = Sheets("Input")
  'create a unique list of column A values in column M
    wsData.Range("A:A").AdvancedFilter Action:=xlFilterCopy, _
         CopyToRange:=wsData.Range("M1"), Unique:=True

  'Find the ONE value in column M that reports to no one, the person at the top
    wsData.Range("N2", wsData.Range("M" & Rows.Count).End(xlUp) _
        .Offset(0, 1)).FormulaR1C1 = "=IF(COUNTIF(C2,RC13)=0,1,"""")"
    Set TopRng = wsData.Columns("N:N").SpecialCells(xlCellTypeFormulas, 1).Offset(0, -1)
  'last row of persons listed in data table
    LR = wsData.Range("A" & wsData.Rows.Count).End(xlUp).Row

'Setup table
    Set wsTree = Sheets("LEVEL STRUCTURE")
    With wsTree
        .Cells.Clear    'clear prior output
        NR = 3          'next row to start entering names

'Parse each run from the top level
    For Each TopR In TopRng         'loop through each unique column A name
        .Range("B" & NR) = TopR
        Set cell = .Cells(NR, .Columns.Count).End(xlToLeft)

        Do Until cell.Column = 1
          'filter data to show current leader only
            wsData.Range("A:A").AutoFilter Field:=1, Criteria1:=cell
        'see how many rows this person has in the table
            LR = wsData.Range("A" & Rows.Count).End(xlUp).Row
            If LR > 1 Then
              'count how many people report to this person
                Rws = Application.WorksheetFunction.Subtotal(103, wsData.Range("B:B")) - 1
              'insert that many blank rows below their name and insert the names
                cell.Offset(1, 1).Resize(Rws).EntireRow.Insert xlShiftDown
                wsData.Range("B2:B" & LR).Copy cell.Offset(1, 1)
              'add a left border if this is the start of a new "group"
                If .Cells(.Rows.Count, cell.Column + 1).End(xlUp).Address _
                    <> cell.Offset(1, 1).Address Then _
                       .Range(cell.Offset(1, 1), cell.Offset(1, 1).End(xlDown)) _
                          .Borders(xlEdgeLeft).Weight = xlThick
            End If

            NR = NR + 1     'increment to the next row to enter the next top leader name
            Set cell = .Cells(NR, .Columns.Count).End(xlToLeft)
        Loop
    Next TopR

  'find the last used column
    i = .Cells.Find("*", .Cells(.Rows.Count, .Columns.Count), _
        SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
  'format the used data range
    With Union(.Range(.[B1], .Cells(1, i)), .Range("B:BB").SpecialCells(xlCellTypeConstants, 23))
        .Interior.ColorIndex = 5
        .Font.ColorIndex = 2
        .Font.Bold = True
        .HorizontalAlignment = xlCenter
    End With
    .Range("B1").Interior.ColorIndex = 53
    .Range("B1").Value = "LEVEL 1"
    .Range("B1").AutoFill Destination:=.Range("B1", .Cells(1, i)), Type:=xlFillDefault
End With

wsData.AutoFilterMode = False
wsData.Range("M:N").ClearContents
wsTree.Activate
Application.ScreenUpdating = True
End Sub

This did 95% of the work I needed to do, the rest I just used excel formulas (no VBA needed afterwards)

steps to get ancestor data from start:

I did following procedure:

0: Have data in a relational data setup

1: Cleaned out any duplicate data conditional formatting for duplicates

2: Run Jerry's Excel VBA Macro. Results below

Col A | ColB  | ColC  | ColD  | ColE  | ColF  |
      | Lvl1  | Lvl2  | Lvl3  | Lvl4  | Lvl5  |

      | AAA   |       |       |       |       |
      |       | BBB   |       |       |       |
      |       | EEE   |       |       |       |
      |       | FFF   |       |       |       |       
      | CCC   |       |       |       |       |
      |       | GGG   |       |       |       |       
      |       |       | III   |       |       |
      |       |       |       | JJJ   |       |
      |       |       |       |       | KKK   |
      | DDD   |       |       |       |       |
      |       | HHH   |       |       |       |

3: Populate the the top level excel via copy+pasting through excel (I only had 3 parent top level categories, so it took 2 minutes)

Col A | ColB  | ColC  | ColD  | ColE  | ColF  |
      | Lvl1  | Lvl2  | Lvl3  | Lvl4  | Lvl5  |

      | AAA   |       |       |       |       |
      | AAA   | BBB   |       |       |       |
      | AAA   | EEE   |       |       |       |
      | AAA   | FFF   |       |       |       |       
      | CCC   |       |       |       |       |
      | CCC   | GGG   |       |       |       |       
      | CCC   |       | III   |       |       |
      | CCC   |       |       | JJJ   |       |
      | CCC   |       |       |       | KKK   |
      | DDD   |       |       |       |       |
      | DDD   | HHH   |       |       |       |

4: Then use a helper column in column A, via this formula

=IF(B19<>"", B19, IF(C19<>"",C19, IF(D19<>"",D19, IF(F19<>"",F19))))

where C, D,E,F are subcategories levels stemming from the parent (Column B). This searches values in column C for input, if its not there, then column D, then E, then F and copies whatever it finds first.

Col A | ColB  | ColC  | ColD  | ColE  | ColF  |
      | Lvl1  | Lvl2  | Lvl3  | Lvl4  | Lvl5  |

      | AAA   |       |       |       |       |
  BBB | AAA   | BBB   |       |       |       |
  EEE | AAA   | EEE   |       |       |       |
  FFF | AAA   | FFF   |       |       |       |       
      | CCC   |       |       |       |       |
  GGG | CCC   | GGG   |       |       |       |       
  III | CCC   |       | III   |       |       |
  JJJ | CCC   |       |       | JJJ   |       |
  KKK | CCC   |       |       |       | KKK   |
      | DDD   |       |       |       |       |
  HHH | DDD   | HHH   |       |       |       |

5: Then use an index / match function now that all data is normalized (on columns A and B) using my original input values as the lookup

6: Clean up any data afterwards manually

Step 3 can be easily macro'd for larger sets of data, just traverse that column and paste until it finds the next value down.

Vincent Tang
  • 3,758
  • 6
  • 45
  • 63