0

Working with large datasets (> 200k rows, > 50 columns). One of the columns contains case numbers, which I extract to a different sheet and remove duplicates to leave only distinct values. For each of these distinct case numbers, I need to find the corresponding maximum value from several other columns in the original dataset (these max values are used in several calculations).

Originally I was looping through the list of distinct case numbers, and using the Application.MaxIfs function against the original dataset. Calling the sheet that often was slowing the loop to a crawl. Currently I've switched to using named ranges stored in memory in hopes of speeding the process up, but it's still taking > 30 minutes to complete.

Is there a more efficient way to accomplish this task? I thought maybe the dictionary method mentioned here may work, but I can't seem to adapt it to find the maximum values. Am I even using the named ranges correctly? I've posted what should be the relevant section of code below. Thank you in advance for any assistance (Note I tried to only include the setting of relevant variables).

    Sub Summarize_Data()
    
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual

    Dim CaseRawRange As Variant
    Dim CaseUniqueRange As Variant
    Dim CalculationsArray as Variant
    Dim DateRange1 as Variant
    Dim DateRange2 as Variant
    Dim DateRange3 as Variant

    CaseRawRange = RawDataWorksheet.Range(RawDataWorksheet.Cells(1, CaseColumn), RawDataWorksheet.Cells(LastRowRawData, CaseColumn)).Value
    CaseUniqueRange = CalculationsWorksheet.Range(CalculationsWorksheet.Cells(1, 1), CalculationsWorksheet.Cells(RMACaseUniqueCount, 1)).Value
    DateRange1 = RawDataWorksheet.Range(RawDataWorksheet.Cells(1, DateColumn1), RawDataWorksheet.Cells(LastRowRawData, DateColumn1)).Value
    DateRange2 = RawDataWorksheet.Range(RawDataWorksheet.Cells(1, DateColumn2), RawDataWorksheet.Cells(LastRowRawData, DateColumn2)).Value
    DateRange3 = RawDataWorksheet.Range(RawDataWorksheet.Cells(1, DateColumn3), RawDataWorksheet.Cells(LastRowRawData, DateColumn3)).Value

    'Find max date values for each unique case number        
    For k = 2 To UniqueCaseCount
        MaxDate1 = 0
        MaxDate2 = 0
        MaxDate3 = 0
                For h = 2 To LastRowRawData
                    If CaseUniqueRange(k, 1) = CaseRawRange(h, 1) And DateRange1(h, 1) > MaxDate1 Then
                    MaxDate1 = DateRange1(h, 1)
                    Else
                    MaxDate1 = MaxDate1
                    End If

                    If CaseUniqueRange(k, 1) = CaseRawRange(h, 1) And DateRange2(h, 1) > MaxDate2 Then
                    MaxDate2 = DateRange2(h, 1)
                    Else
                    MaxDate2 = MaxDate2
                    End If

                    If CaseUniqueRange(k, 1) = CaseRawRange(h, 1) And DateRange3(h, 1) > MaxDate3 Then
                    MaxDate3 = DateRange3(h, 1)
                    Else
                    MaxDate3 = MaxDate3
                    End If
                Next h

            'Perform metrics calculations for each unique case number based on max date values
            If MaxDate1 = 0 or MaxDate1 > MaxDate2 Then
                CalculationsArray(k, 2) = "FALSE"
            Else
                CalculationsArray(k, 2) = "TRUE"
            End If

            If MaxDate2 = MaxDate3 or MaxDate3 = 0 Then
                CalculationsArray(k, 3) = "FALSE"
            Else
                CalculationsArray(k, 3) = "TRUE"
            End If

            If MaxDate1 > MaxDate3 Then
                CalculationsArray(k, 4) = "FALSE"
            Else
                CalculationsArray(k, 4) = "TRUE"
            End If

    Next k

    'Paste entire calculations array into sheet
    CalculationsWorksheet.Range("B2").Resize(UBound(CalculationsArray, 1) - 1, UBound(CalculationsArray, 2)).Value = CalculationsArray

    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic

    End Sub

Raw Data Snippet

teroot
  • 1
  • 1
  • It seems like you're looping when you can just `CountIf()/CountIfs()` and append it to the sheet... heck, for the sake of it, try the helper column. There's also the option to move all data to another sheet and filter. – Cyril Sep 14 '22 at 18:05
  • 1
    What are you doing with `MaxDate` anyways? It seems that you're looping to continuum then immediately resetting it back to `0`. Either you've truncated the code and we're not seeing where other hangups may occur, or you're just looping to loop, where we cannot see what `UniqueCaseCount` is, among other things. – Cyril Sep 14 '22 at 18:09
  • @Cyril UniqueCaseCount is a count of the distinct # of cases - they are listed in a separate sheet from the original raw data. I'm taking each distinct case number from that separate sheet, and looping through a date column in the raw data to find the max corresponding date (there are multiple entries per case number in the raw data that have different date values) . There are actually several different date columns that I perform this same operation on, and those date values are used in metric calculations later within the loop. I reset it back to zero each time I move to a new distinct case – teroot Sep 14 '22 at 19:26
  • Why not just use the `Unique` function? It would probably help to see your data. – pgSystemTester Sep 14 '22 at 20:14
  • Don't use `variant` to store whole numbers if you can avoid it. Use `Long` for any numbers that will be under 2 billion (including dates). [Explanation here](https://stackoverflow.com/a/51689021/11732320) – pgSystemTester Sep 14 '22 at 20:16
  • I have modified the code in the original post to include more details on what is happening in the loops and the metrics being calculated. I have also attached a screenshot of an example of the raw data @pgSystemTester those values are being stored as arrays in memory instead of referencing the sheet for calculations. I don't think Long would work for that application would it? – teroot Sep 14 '22 at 21:04
  • You can store arrays as long `arrayLong() As Long` but it does require some manipulation with excel which might offset the speed. – pgSystemTester Sep 14 '22 at 22:08

0 Answers0