2

I have a dataset that is multiple strings and I want a unique count of the occurrences so I can review and refine my datasets. I've been unable to do this using formulas so went over to VBA, but hit a roadblock as I'm an amateur.

My data looks like this...

enter image description here

I want it to return this...

enter image description here

I've tried parsing it with text to columns, but in large datasets I have 60 columns with 100s of hits in my string. Therefore, transposing it then trying to get a count of uniques would be daunting.

Therefore, I was hoping VBA would help, but I can only seem to get a function and not with a Sub and Function to transpose then count. Something like below...

Sub Main()
    Dim filename As String
    Dim WorksheetName As String
    Dim CellRange As String
    
    Sheets.Add.Name = "ParsedOutput"

'==============================================================
' CHANGE THESE VALUES FOR YOUR SHEET   
WorksheetName =   
CellRange =    
'==============================================================
   
    ' Get range
    Dim Range
    Set Range = ThisWorkbook.Worksheets(WorksheetName).Range(CellRange)

    ' Copy range to avoid overwrite
    Range.Copy _
        Destination:=ThisWorkbook.Worksheets("ParsedOutput").Range("A1")
        
    ' Get copied exclusions
    Dim Copy
    Set Copy = ThisWorkbook.Worksheets("ParsedOutput").Range("A:A")
    
    ' Parse and overwrite
    Copy.TextToColumns _
        Destination:=Range("A:A"), _
        DataType:=xlDelimited, _
        ConsecutiveDelimiter:=True, _
        Comma:=True

End Sub

Option Explicit

Public Function Counter(InputRange As Range) As String

Dim CellValue As Variant, UniqueValues As New Collection

Application.Volatile

'For error Handling On Error Resume Next

'Looping through all the cell in the defined range For Each CellValue In InputRange
    UniqueValues.Add CellValue, CStr(CellValue)  ' add the unique item Next

'Returning the count of number of unique values CountUniqueValues = UniqueValues.Count

End Function
VBasic2008
  • 44,888
  • 5
  • 17
  • 28
BigOleNewb
  • 35
  • 4

6 Answers6

4

For the sake of simplicity, I will take minimal data to demostrate how to achieve what you want. Feel free to change the code to suit your needs.

Excel Sheet

Let's say our worksheet looks like this

enter image description here

Logic:

  1. Find last row and last column as shown HERE and construct your range.
  2. Store the values of that range in an array.
  3. Loop through each item in that array and extract words based of , as a delimiter and store it in the collection. If the delimiter doesnt exist then store the entire word in the collection. To create a unique collection, we use On Error Resume Next as shown in the code below.
  4. Based on the count of words in the collection, we create an 2D array for output. One part of the array will hold the word and the other part will hold the count of occurences.
  5. Use .Find and .FindNext to count the occurence of a word in the range and then store it in array.
  6. Write the array in one go to the relevant cell. For demonstration purpose, I will write to Column D

Code

I have commented the code so you should not have a problem understanding it but if you do then simply ask.

Option Explicit

Sub Sample()
    Dim ws As Worksheet
    
    '~~> Change this to relevant sheet
    Set ws = Sheet1
    
    Dim LastRow As Long, LastColumn As Long
    Dim i As Long, j As Long, k As Long
    Dim col As New Collection
    Dim itm As Variant, myAr As Variant, tmpAr As Variant
    Dim OutputAr() As String
    Dim aCell As Range, bCell As Range, rng As Range
    Dim countOfOccurences As Long
    
    With ws
        '~~> Find last row
        LastRow = .Cells.Find(What:="*", _
                  After:=.Range("A1"), _
                  Lookat:=xlPart, _
                  LookIn:=xlFormulas, _
                  SearchOrder:=xlByRows, _
                  SearchDirection:=xlPrevious, _
                  MatchCase:=False).Row
        
        '~~> Find last column
        LastColumn = .Cells.Find(What:="*", _
                     After:=.Range("A1"), _
                     Lookat:=xlPart, _
                     LookIn:=xlFormulas, _
                     SearchOrder:=xlByColumns, _
                     SearchDirection:=xlPrevious, _
                     MatchCase:=False).Column
                     
        '~~> Construct your range
        Set rng = .Range(.Cells(1, 1), .Cells(LastRow, LastColumn))
        
        '~~> Store the value in an array
        myAr = rng.Value2
        
        '~~> Create a unique collection
        For i = LBound(myAr) To UBound(myAr)
            For j = LBound(myAr) To UBound(myAr)
                If Len(Trim(myAr(i, j))) <> 0 Then
                    '~~> Check data has "," delimiter
                    If InStr(1, myAr(i, j), ",") Then
                        tmpAr = Split(myAr(i, j), ",")
                        
                        For k = LBound(tmpAr) To UBound(tmpAr)
                            On Error Resume Next
                            col.Add tmpAr(k), CStr(tmpAr(k))
                            On Error GoTo 0
                        Next k
                    Else
                        On Error Resume Next
                        col.Add myAr(i, j), CStr(myAr(i, j))
                        On Error GoTo 0
                    End If
                End If
            Next j
        Next i
        
        '~~> Count the number of items in the collection
        i = col.Count
        
        '~~> Create output array for storage
        ReDim OutputAr(1 To i, 1 To 2)
        i = 1
        
        '~~> Loop through unique collection
        For Each itm In col
            OutputAr(i, 1) = Trim(itm)
            countOfOccurences = 0
            
            '~~> Use .Find and .Findnext to count for occurences
            Set aCell = rng.Find(What:=OutputAr(i, 1), LookIn:=xlValues, _
                Lookat:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
                MatchCase:=False, SearchFormat:=False)
        
            If Not aCell Is Nothing Then
                Set bCell = aCell
                countOfOccurences = countOfOccurences + 1
                Do
                    Set aCell = rng.FindNext(After:=aCell)
        
                    If Not aCell Is Nothing Then
                        If aCell.Address = bCell.Address Then Exit Do
                        countOfOccurences = countOfOccurences + 1
                    Else
                        Exit Do
                    End If
                Loop
            End If
            
            '~~> Store count in array
            OutputAr(i, 2) = countOfOccurences
            i = i + 1
        Next itm
        
        '~~> Output it to relevant cell
        .Range("D1").Resize(UBound(OutputAr), 2).Value = OutputAr
    End With
End Sub

Output

enter image description here

Siddharth Rout
  • 147,039
  • 17
  • 206
  • 250
  • This was a really well written answer and explanation but I got an error on "If Len(Trim(myAr(i, j))) <> 0 Then". I wasn't following exactly though as it appears you created a secondary range of strings that would need to be parsed requiring the array and LBound & UBound. Admittedly I'm not 100% sure on that. – BigOleNewb Oct 13 '20 at 16:11
  • 1
    Nice logic and solution – Ricardo Diaz Oct 13 '20 at 16:19
3

The following is a rough approach, and is open to tons of improvements, but should get you started.

Read the comments and adjust the code to fit your needs.

Option Explicit


Public Sub CountWordsInColumn()
    
    ' Adjust to set the sheet holding the data
    Dim sourceSheet As Worksheet
    Set sourceSheet = ThisWorkbook.Worksheets("DataSet")
    
    ' Adjust the column and row that contains the hits
    Dim hitsColumn As String
    Dim hitsStartRow As Long
    Dim lastRow As Long
    hitsColumn = "C"
    hitsStartRow = 2
    lastRow = sourceSheet.Cells(sourceSheet.Rows.Count, hitsColumn).End(xlUp).Row
    
    ' Adjust the column that contains the hits
    Dim sourceRange As Range
    Set sourceRange = sourceSheet.Range(hitsColumn & hitsStartRow & ":" & hitsColumn & lastRow)
    
    ' Add values in each cell split by ,
    Dim evalCell As Range
    Dim splitValues As Variant
    Dim counter As Long
    ReDim splitValues(lastRow - hitsStartRow)
    For Each evalCell In sourceRange
    
        splitValues(counter) = Split(evalCell.Value, ",")
        
        counter = counter + 1
        
    Next evalCell
    
    ' Get all values into an array
    Dim allValues As Variant
    allValues = AddValuesToArray(splitValues)
    
    ' Get unique values into an array
    Dim uniqueValues As Variant
    uniqueValues = GetUniqueValues(allValues)
    
    ' Count duplicated values from unique array
    Dim outputData As Variant
    outputData = CountValuesInArray(uniqueValues, allValues)
    
    ' Add new sheet
    Dim outputSheet As Worksheet
    Set outputSheet = ThisWorkbook.Sheets.Add
    PrintArrayToSheet outputSheet, outputData

End Sub

Private Function AddValuesToArray(ByVal myArray As Variant) As Variant

    Dim counter As Long
    Dim tempArray As Variant
    Dim tempCounter As Long
    Dim tempArrayCounter As Long
    
    ReDim tempArray(0)
    
    For counter = 0 To UBound(myArray)
        
        For tempCounter = 0 To UBound(myArray(counter))
            
            tempArray(tempArrayCounter) = myArray(counter)(tempCounter)
            
            tempArrayCounter = tempArrayCounter + 1
            
            ReDim Preserve tempArray(tempArrayCounter)
        
        Next tempCounter
    
    Next counter
    
    ReDim Preserve tempArray(tempArrayCounter - 1)
    
    AddValuesToArray = tempArray

End Function

Private Function GetUniqueValues(ByVal tempArray As Variant) As Variant
    Dim tempCol As Collection
    Set tempCol = New Collection
    
    On Error Resume Next
    Dim tempItem As Variant
    For Each tempItem In tempArray
        tempCol.Add tempItem, CStr(tempItem)
    Next
    On Error GoTo 0
    
    Dim uniqueArray As Variant
    Dim counter As Long
    ReDim uniqueArray(tempCol.Count - 1)
    For Each tempItem In tempCol
        uniqueArray(counter) = tempCol.Item(counter + 1)
        counter = counter + 1
    Next tempItem
    GetUniqueValues = uniqueArray
    
End Function

Function CountValuesInArray(ByVal uniqueArray As Variant, ByVal allValues As Variant) As Variant
    
    Dim uniqueCounter As Long
    Dim allValuesCounter As Long
    Dim ocurrCounter As Long
    Dim outputData As Variant
    
    ReDim outputData(UBound(uniqueArray))
    
    For uniqueCounter = 0 To UBound(uniqueArray)
    
        For allValuesCounter = 0 To UBound(allValues)
        
            If uniqueArray(uniqueCounter) = allValues(allValuesCounter) Then ocurrCounter = ocurrCounter + 1
        
        Next allValuesCounter
        
        ' This is the output
        Debug.Print uniqueArray(uniqueCounter), ocurrCounter
        outputData(uniqueCounter) = Array(uniqueArray(uniqueCounter), ocurrCounter)
        
        ocurrCounter = 0
    
    Next uniqueCounter
    
    CountValuesInArray = outputData
    
End Function

Private Sub PrintArrayToSheet(ByVal outputSheet As Worksheet, ByVal outputArray As Variant)

    Dim counter As Long
    
    For counter = 0 To UBound(outputArray)
    
        outputSheet.Cells(counter + 1, 1).Value = outputArray(counter)(0)
        outputSheet.Cells(counter + 1, 2).Value = outputArray(counter)(1)
    
    Next counter
End Sub
Ricardo Diaz
  • 5,658
  • 2
  • 19
  • 30
  • This was an amazing answer. Followed my example output exactly. I've already made a few tweaks but it worked like a charm the very first run. I've been re-reading through the code a bit to learn what you did also. Thanks! – BigOleNewb Oct 13 '20 at 16:13
  • Glad it helped. I used arrays for simplicity, but you could use collections and dictionaries and that would speed up the process. Cheers! – Ricardo Diaz Oct 13 '20 at 16:15
2

Try,

It is convenient to use Dictionary to extract duplicate items.

Sub test()
    Dim Ws As Worksheet, wsResult As Worksheet
    Dim vDB, vSplit, v
    Dim Dic As Object 'Scripting.Dictionary
    Dim i As Long, n As Long
    
    Set Dic = CreateObject("Scripting.Dictionary")
    
    Set Ws = Sheets(1) 'ActiveSheet
    vDB = Ws.Range("a1").CurrentRegion
    
    For i = 2 To UBound(vDB, 1)
        vSplit = Split(vDB(i, 3), ",")
        For Each v In vSplit
            If Dic.Exists(v) Then
                Dic(v) = Dic.Item(v) + 1
            Else
                Dic.Add v, 1
            End If
        Next v
    Next i
        
    Set wsResult = Sheets(2)
    n = Dic.Count
    With wsResult
        .UsedRange.Clear
        .Range("a1").Resize(n) = WorksheetFunction.Transpose(Dic.Keys)
        .Range("b1").Resize(n) = WorksheetFunction.Transpose(Dic.Items)
    End With
        
End Sub
Dy.Lee
  • 7,527
  • 1
  • 12
  • 14
1

For all who won't use VBA. Here a solution with PowerQuery:

    Quelle = Excel.CurrentWorkbook(){[Name="tbl_Source"]}[Content],
    Change_Type = Table.TransformColumnTypes(Quelle,{{"ID", Int64.Type}, {"Record", type text}, {"Hits", type text}}),
    Split_Hits = Table.ExpandListColumn(Table.TransformColumns(Change_Type, {{"Hits", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Hits"),
    Clean_Spaces = Table.ReplaceValue(Split_Hits," ","",Replacer.ReplaceText,{"Hits"}),
    Group_Rows = Table.Group(Clean_Spaces, {"Hits"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
    Group_Rows

enter image description here

Chris
  • 933
  • 1
  • 6
  • 16
1

Approach simulating newer TextJoin and Unique functions

In order to complete the above solutions, I demonstrate an approach using

  • [1]a) a replacement of the TextJoin function (available since vers. 2019, MS 365 ~> the newer function code is commented out,btw),
  • [1]b) the FilterXML() function to get unique words (available since vers. 2013+) and
  • [3]a) a negative filtering to calculate results
Sub wordCounts()
'[0]define data range
With Sheet3
    Dim lastRow As Long: lastRow = .Range("A" & .Rows.Count).End(xlUp).Row
    Dim rng As Range: Set rng = .Range("A2:A" & lastRow)
End With
With WorksheetFunction
'[1]split a) available and b) unique words into arrays
'   Dim words:   words = Split(.TextJoin(",", True, rng), ",")   ' (available vers. 2019+ or MS 365)
    Dim words:   words = Split(Join(.Transpose(rng), ","), ",")  '

    Dim uniques: uniques = UniqueXML(words)                      ' (already since vers. 2013+)
    
'[2]provide for calculation
    'fill temporary array with words
    Dim tmp: tmp = words
    'declare cnt array for counting results
    Dim cnt: ReDim cnt(0 To UBound(uniques), 0 To 0)
    Dim old As Long: old = UBound(tmp) + 1      ' remember original size
'[3]get word counts
    Dim elem
    For Each elem In uniques
        'a) filter out current elem
            tmp = Filter(tmp, elem, False)
            Dim curr As Long: curr = UBound(tmp) + 1
        'b) count number of words (as difference of filtered tmp boundaries) ...
            Dim n As Long: n = old - curr
        '   ... and remember latest array boundary
            old = curr
        'c) assign results to array cnt
            Dim i As Long: cnt(i, 0) = n
            i = i + 1                       ' increment counter
    Next elem
'[4]write word counts to target
    rng.Offset(0, 2).Resize(UBound(uniques), 1) = .Transpose(uniques)
    rng.Offset(0, 3).Resize(UBound(cnt), 1) = cnt
End With

End Sub

Help function UniqueXML()

Function UniqueXML(arr, Optional Delim As String = ",", Optional ZeroBased As Boolean = False)
  ' Purp: return unique list of array items
  ' Note: optional argument Delim defaulting to colon (",")
  ' Help: https://learn.microsoft.com/de-de/office/vba/api/excel.worksheetfunction.filterxml
  ' [1] get array data to xml node structure (including root element)
    Dim wellformed As String
    wellformed = "<root><i>" & Join(arr, "</i><i>") & "</i></root>"
  ' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ' [2] define XPath string searching unique item values
  ' Note: c.f. udf: https://stackoverflow.com/questions/58677041/vba-excel-how-to-display-non-equal-values-in-an-excel-array/58685756#58685756
  ' ------------------------------------------------
  ' //i                    ... all <i> node values after the DocumentElement
  ' [not( .=preceding::i)] ... only if not preceded by siblings of the same node value
  ' ------------------------------------------------
    Dim myXPath As String
    myXPath = "//i[not( .=preceding::i)]"
   
  ' [3] get "flat" 1-dim array (~> one-based!)
    Dim tmp As Variant
    tmp = Application.Transpose(WorksheetFunction.FilterXML(wellformed, myXPath))
  ' [3a] optional redim as zero-based array
    If ZeroBased Then ReDim Preserve tmp(LBound(tmp) - 1 To UBound(tmp) - 1)
        
  ' [4] return function result
    UniqueXML = tmp
End Function
T.M.
  • 9,436
  • 3
  • 33
  • 57
0

I didn't understand the problem you have between sub or function; however, this is a function that counts the unique values in a range

    Public Function Counter(InputRange As Variant) As Variant
    
        Dim UniqueValues As New Collection
        Dim Val As Variant
        Dim Cell As Range
        Dim I As Long
        
        Application.Volatile
        
        On Error Resume Next
        For Each Cell In InputRange
            Val = Split(Cell, ",")
            If IsArray(Val) Then
                For I = LBound(Val) To UBound(Val)
                    UniqueValues.Add Val(I), CStr(Val(I))
                Next I
            Else
                UniqueValues.Add Val, CStr(Val)
            End If
        Next Cell
        On Error GoTo 0
        Counter = UniqueValues.Count
    
    End Function
Zer0Kelvin
  • 334
  • 2
  • 7
  • I would have also used this approach of the collection but would have combined with `.Find and .FindNext` to get the result. Let me see if I can quiickly come up with an example. – Siddharth Rout Oct 13 '20 at 05:03