0

In my line of work, one thing that comes up very often is the need to perform median, quartile, and percentile operations on datasets. I'm also forced to use excel (not my choice), and I'm also forced to use pivot tables for functionality reasons ('dem slicers, hot damn).

Excel (at least Excel 2010) does not have have this functionality available in it's pivot tables. Some add-ons, such as power pivot apparently add this, but not everyone has work environments where you can install things whenever. There are some other notable workarounds that:

These workarounds are great, but when I have 100,000+ lines of data, it starts taking some time to crunch on my toaster-computer - and the whole reason I wanted pivot tables is so I could have a slick slicer-driven dashboard that was pretty and responsive. Also importantly, using Excel's conditionals and formulas cracks it when you get higher than 65k+ data points, even though the worksheet can go up to 1M rows of data.

So basically, I want a robust and reliable method of calculating percentile data for a given chosen variable in the pivot table, and I want it to not move at a glacial pace. I also want it to work when the data sub-category is >65K rows.

lolsail
  • 93
  • 1
  • 8

1 Answers1

1

My solution to my problem is coded in VBA. There will be two sections to this answer:

  • The conceptual explanation of what I'm doing
  • The code, which I'll paste at the bottom and out of the way.

Concept

In (almost) one sentence: my approach is a target the pivot table, create an appropriately sized range next to it, and use the target value column to drill down into the pivot table to only that sub category of data - from here the percentile data is calculated via the QUICKSELECT algorithm and pasted into the named range and the drill down data sheet cleaned/delete. Then wash; rinse; repeat.

In more than one sentence:

Our main function (FindPivotPercentiles) takes as inputs the: target Pivot Table, the header string of the value column to calculate values of, and optionally what percentiles to calculate.

We find the result output range to put our calculated percentiles by calling the PivotEdgeRange function, which takes the target pivot table (used to calculate the vertical extent of the output range) and the percentiles (used to calculate the horizontal extend of the range - so a 10 category length pivot table calculating 0.25, .5, and .75 percentiles will return a 3 x 10 target range just to the right of the pivot table).

We also set a target column to the last value column in the pivot table, this is used only to drill down into the data, the actual value chosen does not matter as long as the categories are chosen appropriately.

We then loop through the rows in the data column range, calling the function PivotPercentiles. This function is the bit that actually drills down into the pivot table, taking the target cell supplied from the calling function, and the header string of the actual value we wish to calculate for. The drill down is performed by using targetCell.ShowDetail = True, and the worksheet that Excel creates is automatically renamed by the subroutine autoRenamePivotDetail (I guess this isn't strictly necessary, but I find not being super explicit about these things bites me in the ass later). The target data column within this drilled-down-data is chosen from the header string supplied earlier, and looped over to create an array.

The desired percentile values are calculated from the array (QuickSelect, QSPartition QSSwapElems), using the QUICKSELECT algorithm, which is like QUICKSORT except that it stops when it's determined that the kth value has been found.

Once the array of percentile values for that row in the pivot table is returned, the drill-down data sheet is no longer needed and is deleted. The percentiles are returned to the top level calling function to be pasted into the "results range" determined in the FindPivotPercentiles function, and the process then repeats for the next available row of data until all are exhausted.

Finally, then all this is done, the union of the range of the results and the range of the actual pivot table is found, and given a name so that every time a slicer is used to change the displayed pivot data, the named range is updated dynamically so any dependencies/graphs update as well.

The output looks a bit like this: example of the output from running this subroutine

In this example, the red range is the target column for drilling down (but not necessarily the actual target data for calculating), the green is the results range, and blue is the output named range for later use by the user

Limitations

  • Can only calculate one target value type
  • Requires other sheet names to have sensible names (ie. nothing else of the form "Sheet#")
  • You will need to use Slicer/PivotTable update events placed in the ThisWorkbook module to actually get it to change each time you use a slicer. There's already a tonne of info out there about this so I won't reproduce any here.

Code

Sub FindPivotPercentiles(ByRef targetWorksheet As Worksheet, _
                         ByRef targetPivot As PivotTable, _
                         ByRef columnToCalc As String, _
                Optional ByRef percentInputs As Variant = -1)
'**********************************************************
'** Finds all Percentile Data for a given pivot table    **
'**********************************************************
Dim valueTitle As String                ' detail sheet column title to calc values from
Dim targetColumn As Range               ' Range of above column
Dim resultColumn As Range               ' output range for percentile values
Dim wholePivotRange As Range            ' selects entire pivot for naming range
Dim percentiles() As Single             ' desired percentile levels to calculate
Dim j As Long                           ' iterator
Dim k As Long                           ' iterator

' ------------------------------------------------------------------------------------
' User selectable things, choose the column to calculate percentiles for (eg "DAP")
' ------------------------------------------------------------------------------------
valueTitle = columnToCalc

' ------------------------------------------------------------------------------------
' Initalize the percentile array with default values if no user specified option present
' ------------------------------------------------------------------------------------
If IsArray(percentInputs) = False Then
    If percentInputs = -1 Then
        ReDim percentiles(1 To 7) As Single
            percentiles(1) = 0#
            percentiles(2) = 0.1
            percentiles(3) = 0.25
            percentiles(4) = 0.5
            percentiles(5) = 0.75
            percentiles(6) = 0.9
            percentiles(7) = 1#
    End If
Else
    ' ------ otherwise make them the input values
    ReDim percentiles(1 To UBound(percentInputs) - LBound(percentInputs) + 1) As Single
    For j = 1 To UBound(percentInputs) - LBound(percentInputs) + 1
        percentiles(j) = percentInputs(j - (1 - LBound(percentInputs)))
    Next j
End If

Set resultColumn = PivotEdgeRange(targetWorksheet, targetPivot, percentiles)
Set targetColumn = targetPivot.DataBodyRange.Resize(, 1)

' ------------------------------------------------------------------------------------
' Clean up work area, create some percentile headers and appropriately format then,
' then calculate percentile values and paste in area to right of pivot table
' ------------------------------------------------------------------------------------
resultColumn.Resize(, 100).EntireColumn.Clear
targetColumn.Resize(1).Offset(-1, 0).Copy
resultColumn.Resize(1).Offset(-1, 0).PasteSpecial xlPasteFormats
resultColumn.Resize(1).Offset(-1, 0) = percentiles

For j = 1 To targetColumn.Rows.count
    resultColumn.Rows(j) = PivotPercentiles(targetColumn.Rows(j), targetWorksheet, valueTitle, percentiles, True)
    Debug.Print "calculating row: " & j
Next j

' ------ format nicely, name range for later use
With resultColumn
    .NumberFormat = "0.00"
    Set resultColumn = .Resize(.Rows.count + 1).Offset(-1, 0)
    .Name = valueTitle & "_Percents"
End With

Set wholePivotRange = targetPivot.TableRange1
Set wholePivotRange = Union(wholePivotRange, resultColumn)
wholePivotRange.Name = valueTitle & "_WholeTable"

End Sub

PivotEdgeRange

Function PivotEdgeRange(ByRef targetWorksheet As Worksheet, _
                        ByRef targetPivot As PivotTable, _
               Optional ByRef percentInputs As Variant = -1) As Range
'*************************************************************************
'** For a given pivot table, set a range to the right most empty column **
'*************************************************************************
Dim numDataCols As Long                 ' number of value fields in table
Dim howManyOutputColumns As Integer     ' how many percentile calculation cols

If IsArray(percentInputs) = False Then
    If percentInputs = -1 Then: howManyOutputColumns = 7
Else
    howManyOutputColumns = UBound(percentInputs)
End If

numDataCols = targetPivot.DataBodyRange.Columns.count
Set PivotEdgeRange = targetPivot.DataBodyRange.Resize(, howManyOutputColumns).Offset(, numDataCols)

End Function

PivotPercentiles

Function PivotPercentiles(ByRef targetCell As Range, _
                          ByRef pivotCacheSheet As Worksheet, _
                          ByRef valueTitle As String, _
                 Optional ByRef percentInputs As Variant = -1, _
                 Optional ByRef suppressErrors As Boolean = False) As Double()
'************************************************************************
'** Finds the Percentile Data for a given grouping in a pivot table    **
'************************************************************************
Dim targetColumn As Range           ' targetColumn for calculation
Dim numberOfVals As Long            ' number of values, used to determine array size
Dim j As Long                       ' iterator
Dim badData As Boolean              ' flags that non-numeric data was in target column
Dim range2Array() As Double         ' coverts target range to array for faster calculation
Dim percentiles() As Single         ' desired percentile levels to calculate
Dim percentOutputs() As Double      ' holds calculated percentiles
Dim targetkth As Long               ' kth smallest value to extract from quickselect
' ------------------------------------------------------------------------------------
' Initalize the percentile array with default values if no user specified option
' present. This is already done in FindPivotPercentiles, so is mostly redundant, but
' I have left this in so you may have the option of using this as a stand alone function
' ------------------------------------------------------------------------------------
If IsArray(percentInputs) = False Then
    If percentInputs = -1 Then
        ReDim percentiles(1 To 7) As Single
            percentiles(1) = 0#
            percentiles(2) = 0.1
            percentiles(3) = 0.25
            percentiles(4) = 0.5
            percentiles(5) = 0.75
            percentiles(6) = 0.9
            percentiles(7) = 1#
    End If
Else
    ' ------ otherwise make them the input values
    ReDim percentiles(1 To UBound(percentInputs) - LBound(percentInputs) + 1) As Single
    For j = 1 To UBound(percentInputs) - LBound(percentInputs) + 1
        percentiles(j) = percentInputs(j - (1 - LBound(percentInputs)))
    Next j
End If

ReDim percentOutputs(1 To UBound(percentiles)) As Double
' ------------------------------------------------------------------------------------
' Show detail in target cell so we can compute the percentile data values. Show detail
' creates a new sheet with name "Sheet*" - we switch to this, and compute the medians
' in this space, before deleting it.
' ------------------------------------------------------------------------------------
targetCell.ShowDetail = True
Call autoRenamePivotDetail
Set pivotCacheSheet = ThisWorkbook.Sheets("PivotDetail")

Set targetColumn = pivotCacheSheet.ListObjects(1).ListColumns(valueTitle).DataBodyRange
numberOfVals = targetColumn.Rows.count

' ----- convert range to array, quicker computation time
ReDim range2Array(1 To numberOfVals) As Double
For j = 1 To targetColumn.Rows.count
    Select Case VarType(targetColumn(j))
        Case 2 To 4
            range2Array(j) = CDbl(targetColumn(j))
        Case 5
            range2Array(j) = targetColumn(j)
        Case 8
            range2Array(j) = CDbl(targetColumn(j))
        Case Else
            range2Array(j) = Empty
            badData = True
    End Select
Next j
' ------------------------------------------------------------------------------------
' Now we have an array to play with, loop over desired percentile values and calculate
' If arrays are no larger than 10, the .Percentile_Inc function can be used instead -
' this is neccessary as QuickSelect breaks when less than 3 array size.
' ------------------------------------------------------------------------------------
For j = 1 To UBound(percentiles)
    If UBound(range2Array) < 10 Then
        percentOutputs(j) = Application.WorksheetFunction.Percentile_Inc(range2Array, percentiles(j))
    Else
        targetkth = percentiles(j) * UBound(range2Array)
        If targetkth = 0 Then: targetkth = 1
        percentOutputs(j) = QuickSelect(range2Array, targetkth, 1, UBound(range2Array))
    End If
Next j

' ------ clean up
Application.DisplayAlerts = False
pivotCacheSheet.Delete
Application.DisplayAlerts = True

If badData And suppressErrors <> True Then
    MsgBox "Bad data (non-numeric) was found in the target range. Please ensure better cleaning of input data."
End If

' ----- output values to calling function
PivotPercentiles = percentOutputs
End Function

autoRenamePivotTable

Sub autoRenamePivotDetail()
'*****************************************************************************
'** Automatically rename the newly created detailed pivot data. This        **
'** relies on all worksheets otherwise having sensible names, it will ruin  **
'** your sheet names otherwise                                              **
'*****************************************************************************
Dim detailedPivotFound As Boolean       ' True if pivot detail sheet found
Dim wSheet As Worksheet                 ' Worksheet iterator
Dim renameSheet As Worksheet            ' Sheet to actually rename

detailedPivotFound = False
If CheckPresent("PivotDetail", "Worksheet") Then
    Err.Raise Number:=2000, _
    Source:="AlreadyPresent", _
    Description:="Cannot rename worksheet; 'PivotDetail sheet already present"
End If

' ------ cycle through, change name to pivotsheet
For Each wSheet In ThisWorkbook.Worksheets
    If wSheet.Name Like "Sheet*" And detailedPivotFound = True Then
        Err.Raise Number:=2000, _
        Source:="AlreadyPresent", _
        Description:="Cannot expand pivot detail; spare 'sheet' name already present."
    End If
    If wSheet.Name Like "Sheet*" Then
        detailedPivotFound = True
        Set renameSheet = wSheet
    End If
Next wSheet
renameSheet.Name = "PivotDetail"

' ------------------------------------------------------------------------------------
' Error handling for cases which do not have a CheckPresent datatype defined for them
' as of yet, or if multiple "sheet*" names are found
' ------------------------------------------------------------------------------------
AlreadyPresent_End:
    Exit Sub
AlreadyPresent_Err:
    MsgBox Prompt:="Error number " & Err.Number & " was raised. " & _
        vbCrLf & "Source: " & Err.Source & vbCrLf & _
        "Description: " & Err.Description
    Resume AlreadyPresent_End
End Sub

All the QuickSelect stuff:

Function QuickSelect(list() As Double, k As Long, startInterval As Long, endInterval As Long) As Double
'**********************************************************************
'** Uses a partial form of QUICKSELECT to find k'th percentile value **
'**********************************************************************
Dim pivotInterval As Long                       ' where the current pivotpoint is
Dim splitInterval As Long                       ' where to split the interval
Dim notFinished As Boolean: notFinished = True  ' flips to true once kth val found
' ------------------------------------------------------------------------------------
' Adapted from:
' https://stackoverflow.com/questions/3779763/fast-algorithm-for-computing-percentiles-to-remove-outliers
' More information on the QuickSelect algorithm: https://en.wikipedia.org/wiki/Quickselect
' ------------------------------------------------------------------------------------
While notFinished
    pivotInterval = CLng(Floor(startInterval + endInterval) / 2)

    ' ------ make sure min and max cases handled properly
    If k = 1 Then pivotInterval = CLng(Floor(startInterval + endInterval) / 2)
    If k = endInterval Then pivotInterval = CLng(Ceiling(startInterval + endInterval) / 2)

    splitInterval = QSPartition(list, startInterval, endInterval, pivotInterval)

    If k < splitInterval Then
        endInterval = splitInterval
    ElseIf k > splitInterval Then
        startInterval = splitInterval + 1
    Else
        QuickSelect = list(k)
        notFinished = False
    End If

    ' ------ break loop for max case.
    If k = startInterval And k = endInterval Then
        QuickSelect = list(k)
        notFinished = False
    End If
Wend
End Function

Function QSPartition(list() As Double, startInterval As Long, endInterval As Long, _
                    pivotInterval As Long) As Long
'**********************************************************
'** Swaps higher and lower elements within the partition **
'**********************************************************
Dim pivotValue As Double        ' the value of the pivot point
Dim newPivot As Double          ' the new pivot point after partitioning
Dim storeInterval As Double     ' temp value to help swapping pivot value
Dim i As Double                 ' iterator

pivotValue = list(pivotInterval)
list(pivotInterval) = list(startInterval)
list(startInterval) = pivotValue

storeInterval = startInterval + 1
While (storeInterval < endInterval) And list(storeInterval) <= pivotValue
    storeInterval = storeInterval + 1
Wend

For i = storeInterval + 1 To endInterval
    If list(i) <= pivotValue Then
        Call QSSwapElems(list, i, storeInterval)
        storeInterval = storeInterval + 1
    End If
Next i

newPivot = storeInterval - 1
list(startInterval) = list(newPivot)
list(newPivot) = pivotValue

QSPartition = newPivot
End Function

Sub QSSwapElems(ByRef list() As Double, ByVal i As Long, ByVal j As Long)
'************************
'** Swap list elements **
'************************
Dim temp As Double

temp = list(i)
list(i) = list(j)
list(j) = temp
End Sub

Other Misc Needed Code

Function CheckPresent(checkName As String, checkType As String) As Boolean
'*******************************************************************
'** Checks that <checkname> is present in object type <checktype> **
'*******************************************************************
Dim index As Long                   ' iterator

CheckPresent = False
index = 1

Select Case checkType
    Case "Connections"
        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        ' Find whether a given named Connection exists
        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        Dim conName As Connections
        Dim conParent As Variant

        Set conParent = ThisWorkbook.Connections
        While (CheckPresent <> True And index <= conParent.count)
            If conParent(index).Name = checkName Then: CheckPresent = True
            index = index + 1
        Wend

    Case "Worksheet"
        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        ' Find whether a given named worksheet exists
        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        Dim wName As Worksheet
        Dim sheetParents As Sheets

        Set sheetParents = ThisWorkbook.Worksheets
        While (CheckPresent <> True And index <= sheetParents.count)
            If sheetParents(index).Name = checkName Then: CheckPresent = True
            index = index + 1
        Wend

        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        ' handle cases where no data type exists,
        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Case Else
        Err.Raise Number:=vbObjectError + 1001, _
        Source:="TypeNotPresent", _
        Description:="This function does not handle variable type: " & checkType & _
                     ". Please create a case to handle it in the CheckPresent function."

End Select

' ------------------------------------------------------------------------------------
' Error handling for cases which do not have a CheckPresent datatype defined for them
' as of yet.
' ------------------------------------------------------------------------------------
TypeNotPresent_End:
    Exit Function

TypeNotPresent_Err:
    MsgBox Prompt:="Error number " & Err.Number & " was raised. " & _
        vbCrLf & "Source: " & Err.Source & vbCrLf & _
        "Description: " & Err.Description
    Resume TypeNotPresent_End


End Function
lolsail
  • 93
  • 1
  • 8