0

Alright, this is a very specific question. I have an excel macro written that takes a web URL, delimits it, transposes it, and then adds adjacent columns that describe the information in the originally transposed columns. Now, I need to add something to my macro that will loop through and check if the first character of one cell matches one of the first 4 characters of another cell. If it does, I need to concatenate strings from the descriptive columns to new cells. I'll illustrate this below:

enter image description here


3,435,201,0.5,%22type%25202%2520diabetes%22,0   Node    type 2 diabetes
4,165,97,0.5,%22diet%22,0                       Node    diet
5,149,248,0.5,%22lack%2520of%2520exercise%22,2  Node    lack of exercise
6,289,329,0.5,%22genetics%22,3                  Node    genetics
7,300,71,0.5,%22blood%2520pressure%2520%22,5    Node    blood pressure 
7,3,-7,1,0                                      Arrow   +
4,3,-21,1,0                                     Arrow   +
5,3,-22,1,0                                     Arrow   +
6,3,-34,1,0                                     Arrow   +
,7%5D                                           Tail     

I added color to make the concept of the problem more easily visualized. In row one of the first column, we see a red 3 that corresponds to 'type 2 diabetes'. In the fifth row of the first column, we see a blue 7 that corresponds to 'blood pressure'. These are both node objects, as the adjacent column signifies. In the sixth cell of the first column we see a blue 7 and a red 3. This indicates that an arrow (also signified by adjacent column) is connecting blood pressure to diabetes. In the next column over, we see an orange plus sign, which indicates this is a positive relationship.

The goal is to populate the next column over with "blood pressure + type diabetes", as I demonstrated in the image. So, I need some code to check the first characters in each node cell, and then compare them to the first 4 characters of each arrow cell. When an arrow that matches two of the nodes is found, I need the code to populate the row next to the + signs with a concatenated string comprised of the names of the nodes pertaining to that arrow, as well as the + sign between them (it's possible that it could also be a minus sign, but one isn't present in this example). Any pointers? I can't wrap my head around this. Edited to add Data

Here is the code of my current macro:

Sub Delimit_Transpose()
    
    Cells.Replace What:="],[", Replacement:="@", LookAt:=xlPart, SearchOrder _
        :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False
    
    ActiveCell.FormulaR1C1 = "=RIGHT(R[-1]C,LEN(R[-1]C)-36)"
    
    Dim i As Long, strTxt As String
    Dim startP As Range
    Dim xRg As Range, yRg As Range
    On Error Resume Next
    Set xRg = Application.InputBox _
    (Prompt:="Range Selection...", _
    Title:="Delimit Transpose", Type:=8)
    i = 1
    Application.ScreenUpdating = False
    For Each yRg In xRg
        If i = 1 Then
            strTxt = yRg.Text
            i = 2
        Else
            strTxt = strTxt & "," & yRg.Text
        End If
    Next
    Application.ScreenUpdating = True
    Set startP = Application.InputBox _
    (Prompt:="Paste Range...", _
    Title:="Delimit Transpose", Type:=8)
    ary = Split(strTxt, "@")
    i = 1
    Application.ScreenUpdating = False
    For Each a In ary
        startP(i, 1).Value = Replace(Replace(a, "[", ""), "]", "")
        i = i + 1
    Next a
    
    i = 1
    For Each a In ary
       If Len(a) > 13 Then
           startP.Offset(i - 1, 1).Value = "Node"
        ElseIf Len(a) < 13 And Len(a) > 6 Then
            startP.Offset(i - 1, 1).Value = "Arrow"
        Else
            startP.Offset(i - 1, 1).Value = "Tail"
        End If
        i = i + 1
    Next a

    Dim openPos As Integer
    Dim closePos As Integer
    Dim midBit As String
    
    i = 1
    n = 5
    For Each a In ary
    openPos = InStr(a, ",%22")
     On Error Resume Next
    closePos = InStr(a, "%22,")
     On Error Resume Next
    midBit = Mid(a, openPos + 1, closePos - openPos - 1)
     On Error Resume Next
        If openPos <> 0 And Len(midBit) > 0 Then
            startP.Offset(i - 1, 2).Value = Replace(Replace(midBit, "%22", ""), "%2520", " ")
        ElseIf Len(a) < 13 And InStr(a, "-") = 4 Then
            startP.Offset(i - 1, 2).Value = "'-"
        ElseIf Len(a) < 7 Then
            startP.Offset(i - 1, 2).Value = " "
        Else
            startP.Offset(i - 1, 2).Value = "+"
        End If
        i = i + 1
        n = n + 1
    Next a

    Application.ScreenUpdating = True
End Sub

Rekigama
  • 25
  • 4
  • Cell 9 in first column would concatenate genetics + type 2 diabetes? could you please [edit] your post and copy the data so we don't have to retype it? – Ricardo Diaz Oct 28 '20 at 03:02
  • Done, sorry about that. And yes, Cell 9 would concatenate genetics + type 2 diabetes. – Rekigama Oct 28 '20 at 03:12

2 Answers2

0

This is my approach.

There's room for a lot of improvements, but is a rough code that should get you started.

Read the code's comments and adapt it to fit your needs.


EDIT: I updated the code to match the sample worksheet you uploaded, build the first column range dinamically, validate if commas appear in the first column cell so no error is raised.

As I said in the comments, it's better easier to debug if you call one procedure from the other, instead of merging them.

Code:

Option Explicit

Public Sub StoreConcatenate()

    ' Basic error handling
    On Error GoTo CleanFail
    
    ' Define general parameters
    Dim targetSheetName As String
    targetSheetName = "Test space" ' Sheet holding the data
    
    Dim firstColumnLetter As String
    firstColumnLetter = "C" ' First column holding the numbers
    
    Dim firstColumnStartRow As Long
    firstColumnStartRow = 7
    
    ' With these three parameters we'll build the range address holding the first column dynamically
    
    ' Set reference to worksheet
    Dim targetSheet As Worksheet
    Set targetSheet = ThisWorkbook.Worksheets(targetSheetName)
    
    ' Find last row in column (Modify on what column)
    Dim firstColumnlastRow As Long
    firstColumnlastRow = targetSheet.Cells(targetSheet.Rows.Count, firstColumnLetter).End(xlUp).Row
    
    ' Build range of first column dinamically
    Dim firstColumnRange As Range
    Set firstColumnRange = targetSheet.Range(firstColumnLetter & firstColumnStartRow & ":" & firstColumnLetter & firstColumnlastRow)
    
    ' Loop through first column range cells
    Dim valueCell As Range
    For Each valueCell In firstColumnRange
        
        ' Check if cell contains "," in the second position in string
        If InStr(valueCell.Value, ",") = 2 Then
        
            ' Store first digit of cell before ","
            Dim firstDigit As Integer
            firstDigit = Split(valueCell.Value, ",")(0)
            
            
            ' Check if cell contains "," in the fourth position in string
            If InStr(3, valueCell.Value, ",") = 4 Then
            
                ' Store second digit of cell after ","
                Dim secondDigit As Integer
                secondDigit = Split(valueCell.Value, ",")(1)
            
            End If
            
            ' Store second colum type
            Dim secondColumnType As String
            secondColumnType = valueCell.Offset(, 1).Value
            
            ' Store third column value
            Dim thirdColumnValue As String
            thirdColumnValue = valueCell.Offset(, 2).Value
            
            ' Store nodes values (first digit and second column type)
            Select Case secondColumnType
            Case "Node"
                Dim nodeValues() As Variant
                Dim nodeCounter As Long
                ReDim Preserve nodeValues(nodeCounter)
                
                nodeValues(nodeCounter) = Array(firstDigit, thirdColumnValue)
                
                nodeCounter = nodeCounter + 1
            Case "Arrow"
                Dim matchedNodeFirstValue As String
                Dim matchedNodeSecondValue As String
                matchedNodeFirstValue = IsInArrayReturnItem(firstDigit, nodeValues)(1)
                matchedNodeSecondValue = IsInArrayReturnItem(secondDigit, nodeValues)(1)
                If matchedNodeFirstValue <> vbNullString And matchedNodeSecondValue <> vbNullString Then
                    valueCell.Offset(, 3).Value = matchedNodeFirstValue & Space(1) & thirdColumnValue & Space(1) & matchedNodeSecondValue
                End If
            End Select
        
        End If
    
    Next valueCell
    
CleanExit:
    Exit Sub
 
CleanFail:
    Debug.Print "Something went wrong: " & Err.Description
    Resume CleanExit
    
End Sub

' Credits: https://stackoverflow.com/a/38268261/1521579
Public Function IsInArrayReturnItem(stringToBeFound As Integer, arr As Variant) As Variant
    Dim i
    For i = LBound(arr) To UBound(arr)
        If arr(i)(0) = stringToBeFound Then
            IsInArrayReturnItem = arr(i)
            Exit Function
        End If
    Next i
    IsInArrayReturnItem = Array(vbNullString, vbNullString)
End Function

Let me know if it works

Ricardo Diaz
  • 5,658
  • 2
  • 19
  • 30
  • I modified the code so I could use it in a for loop, as I'll need to use this macro on a large number of urls. It's working, but not as intended. It's returning every arrow as "type 2 diabetes + type 2 diabetes", which makes me think that the firstDigit isn't being updated properly. – Rekigama Oct 28 '20 at 07:52
  • You can run it step by step using F8 key. You can call this procedure from the one that you already have, so you dont have that big long macro, and can debug it more easily. I tested it with a short range. Maybe you can upload a sample workbook and share the link. I can take a look at it. – Ricardo Diaz Oct 28 '20 at 11:10
  • https://drive.google.com/file/d/11pj3lBf1rZ0xKIZ8-fz-leppbCKQmQnt/view?usp=sharing Here's the sheet I was working in. I'll give what you suggested a shot. – Rekigama Oct 28 '20 at 11:25
  • @Rekigama Check the updated code and let me know how it goes – Ricardo Diaz Oct 28 '20 at 12:30
0

It appears that you are concatenating the lookups based on the

  • first and second integers,
  • where the second column = "Arrow"

If that is the case, I suggest:

  • Read the data table into a VBA array for faster processing
    • I am assuming your data is ordered as you show it, with all the Node entries at the start.
    • if that is not the case, then loop twice -- once to find the Nodes, and second time to concatenate the Arrow data.
  • Read the diagnoses into a dictionary for fact lookup.
  • if column2 = "Arrow" then concatenate the lookups of the first and second integers
  • Write back the data

Note: As written, this will overwrite the original table destroying any formulas that might be there. If needed, you could easily modify it to only overwrite the necessary area.

Note2 Be sure to set a reference (under Tools/References) to Microsoft Scripting Runtime, or change the Dictionary declaration to late-binding.

Regular Module

'set reference to Microsoft Scripting Runtime
Option Explicit
Sub Dx()
    Dim WS As Worksheet
    Dim rngData As Range, c As Range, vData As Variant
    Dim dDx As Dictionary
    Dim I As Long, sKey As String, dxKeys As Variant
    
'Get the data range
Set WS = ThisWorkbook.Worksheets("sheet1")
With WS

    'assume table starts in A1 and is three columns wide
    Set rngData = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(columnsize:=3)
    
    'read into variant array for faster processing
    vData = rngData
End With

'create dictionsry for dx lookups
Set dDx = New Dictionary
For I = 2 To UBound(vData, 1)
    Select Case vData(I, 2)
        Case "Node"
            sKey = Split(vData(I, 1), ",")(0) 'first comma-separated number
            If dDx.Exists(sKey) Then
                MsgBox "duplicate diagnostic key. Please correct the data"
                Exit Sub
            End If
    
            dDx.Add Key:=sKey, Item:=vData(I, 3)
            
        Case "Arrow"
            dxKeys = Split(vData(I, 1), ",")
            vData(I, 3) = dDx(dxKeys(0)) & " + " & dDx(dxKeys(1))
    End Select
Next I

'reWrite the table
Application.ScreenUpdating = False
rngData = vData
    
End Sub

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60