1

At the request of a user, I have rewritten this question with more information and tried to clarify it as much as I possibly can.

I have code that reads a range into an array. Many calculations are performed. The resulting array contains an ID and two values:

ID   Seq   Value
a    1     100
a    2     150
a    3     200
b    1     10
b    2     10
b    3     10

However, the calculation step uses Redim Preserve so I have to store the array as TestArray(1 To 3, 1 To 6).

I need to filter the array for duplicate ID's.

If there is no duplicate, I need to store ID, seq and value.

If there is a duplicate ID, I need to store the ID, seq and value where value is the maximum value for a given ID.

If there is a duplicate ID and there are multiple instances of a maximum value, I want to keep the ID, date and value where the value is the maximum value for a given ID and seq is the minimum seq for a given ID.

Basically, for each ID I want the maximum value and if there are multiple maximums, default to the earliest sequence number.

This is a sample of code that shows how the array is structured and what I need the results to look like.

Sub TestArray()

  Dim TestArray() As Variant
  Dim DesiredResults() As Variant

  TestArray = Array(Array("a", "a", "a", "b", "b", "b"), _
    Array(1, 2, 3, 1, 2, 3), _
    Array(100, 150, 200, 10, 10, 10))
  DesiredResults = Array(Array("a", "b"), Array(3, 1), Array(200, 10))

End Sub

Is there some way to loop through the array and find duplicates and then compare them? I could do this easily in SQL but I am struggling in VBA.

Jeffrey Kramer
  • 1,345
  • 6
  • 25
  • 43
  • it would be easier to check for duplicates while its still a range in a sheet – Mr.Monshaw Aug 27 '13 at 19:08
  • is that an option to it first? thats what i meant – Mr.Monshaw Aug 27 '13 at 19:09
  • No unfortunatley not. I have to pull the raw data into VBA and do a lot of processing because it's an unreliable database output that I can't control. – Jeffrey Kramer Aug 27 '13 at 19:11
  • this is certainly possible and I will post a solution later (I have to pop out now) but can you please confirm whether or not the data is meant to be in a jagged array? The structure you describe is not a jagged array, but the test data of TestArray is. – Cor_Blimey Aug 27 '13 at 19:15
  • Oh, definitely not jagged, I was just trying to input some real data to show what I was trying to get since it pulls from a worksheet range. It's just a normal array in the application. Thanks! – Jeffrey Kramer Aug 27 '13 at 19:16
  • 1
    To find/eliminate duplicates you can use a Dictionary object as explained here: http://stackoverflow.com/questions/915317/does-vba-have-dictionary-structure It allows you to "Add" all IDs as "Keys". In the end, you get a collection of unique keys. – Axel Kemper Aug 27 '13 at 19:25
  • That was my first thought but I'm unclear on how to resolve *which* duplicate to keep using the dictionary approach. – Jeffrey Kramer Aug 27 '13 at 19:31
  • You should go back and delete this: http://stackoverflow.com/questions/18473001/remove-duplicates-with-condition-in-vba-array – Johnny Bones Aug 27 '13 at 19:38
  • do you want the output array in the same (0 to 3, 0 to 5) format or in a (0 to 5, 0 to 3)? What base? – Cor_Blimey Aug 27 '13 at 19:39
  • It's fine as (0 to 3, 0 to 5), I'm more confused on the mechanics than anything else. Thanks so much! – Jeffrey Kramer Aug 27 '13 at 19:46
  • @JohnnyBones Sorry, thought I had already. – Jeffrey Kramer Aug 27 '13 at 19:46

1 Answers1

5

I kept my test code in so you can inspect the results and play around. I commented why certain things are being done - hope it helps.

The return array is base 1, in the format (column, row). You can of course change this.

Option Explicit

Public Sub TestProcess()

    Dim testResults
    testResults = GetProcessedArray(getTestArray)
    With ActiveSheet
        .Range( _
            .Cells(1, 1), _
            .Cells( _
                1 + UBound(testResults, 1) - LBound(testResults, 1), _
                1 + UBound(testResults, 2) - LBound(testResults, 2))) _
            .Value = testResults
    End With

End Sub

Public Function GetProcessedArray(dataArr As Variant) As Variant

    Dim c As Collection
    Dim resultsArr
    Dim oldResult, key As String
    Dim i As Long, j As Long, lb1 As Long

    Set c = New Collection
    lb1 = LBound(dataArr, 1) 'just cache the value of the lower bound as we use it a lot

    For j = LBound(dataArr, 2) To UBound(dataArr, 2)

        'extract current result for the ID, if any
        '(note that if the ID's aren't necessarily the same type you can add
        ' the key with  prefix of VarType or TypeName as something like key = CStr(VarType(x)) & "|" & CStr(x))
        key = CStr(dataArr(lb1 + 0, j))
        On Error Resume Next
        oldResult = c(key)

        If Err.Number = 5 Then 'error number if record does not exist

            On Error GoTo 0
            'record doesn't exist so add it
            c.Add Array( _
                key, _
                dataArr(lb1 + 1, j), _
                dataArr(lb1 + 2, j)), _
                key

        Else

            On Error GoTo 0
            'test if new value is greater than old value
            If dataArr(lb1 + 2, j) > oldResult(2) Then
                'we want the new one, so:
                'Collection.Item reference is immutable so remove the record
                c.Remove key
                'and Add the new one
                c.Add Array( _
                    key, _
                    dataArr(lb1 + 1, j), _
                    dataArr(lb1 + 2, j)), _
                    key
            ElseIf dataArr(lb1 + 2, j) = oldResult(2) Then
                'test if new sequence number is less than old sequence number
                If dataArr(lb1 + 1, j) < oldResult(1) Then
                    'we want the new one, so:
                    'Collection.Item reference is immutable so remove the record
                    c.Remove key
                    'and Add the new one
                    c.Add Array( _
                        key, _
                        dataArr(lb1 + 1, j), _
                        dataArr(lb1 + 2, j)), _
                        key
                End If
            End If

        End If

    Next j

    'process results into the desired array format
    ReDim resultsArr(1 To 3, 1 To c.Count)
    For j = 1 To c.Count
        For i = 1 To 3
            resultsArr(i, j) = c(j - LBound(resultsArr, 2) + 1)(i - LBound(resultsArr, 1))
        Next i
    Next j

    GetProcessedArray = resultsArr

 End Function

Private Function getTestArray()

  Dim testArray() As Variant
  Dim flatArray
  Dim i As Long
  ReDim flatArray(0 To 2, 0 To 5)

  testArray = Array( _
    Array("a", "a", "a", "b", "b", "b"), _
    Array(1, 2, 3, 1, 2, 3), _
    Array(100, 150, 200, 10, 10, 10))

  For i = 0 To 5

    flatArray(0, i) = testArray(0)(i)
    flatArray(1, i) = testArray(1)(i)
    flatArray(2, i) = testArray(2)(i)

  Next i

  getTestArray = flatArray

End Function
Cor_Blimey
  • 3,260
  • 1
  • 14
  • 20
  • 1
    This is insane, THANK YOU! I will learn a lot from breaking down how this works and understanding it. – Jeffrey Kramer Aug 27 '13 at 20:10
  • @JeffreyKramer you're welcome - let me know if it doesn't work on the full data set! I did make an assumption that all ID's are strings (or at least that the default string conversion is ok), and the odd other small assumption (like an err.number <>5 means the oldResult was retrieved ok). But it should be alright... – Cor_Blimey Aug 27 '13 at 20:15
  • Yes, it's working fine for the whole set. This is really interesting to know, because this problem has vexed me for a while. I could always filter for duplicates but I never managed to connect how to do it in conjunction with other criteria. This isn't something that comes up often but it's a huge hassle when it does and this will handle it great. I can also expand upon it to take care of a lot of other stuff. – Jeffrey Kramer Aug 27 '13 at 20:18