0

Can you please assist / request a code piece to efficiently sort. Cannot find radix sort for vbscript - 2D arrays / able to implement well.

Sample structure of my array is :

resultarray(0,1) = "Name1"
resultarray(1,1) = "Score1"
resultarray(2,1) = "Category1"
resultarray(3,1) = "OtherDetail1"
resultarray(4,1) = "OtherDetail1"
resultarray(5,1) = "OtherDetail1"

resultarray(0,2) = "Name2"
resultarray(1,2) = "Score2"
resultarray(2,2) = "Category2"
resultarray(3,2) = "OtherDetail2"
resultarray(4,2) = "OtherDetail2"
resultarray(5,2) = "OtherDetail2"

resultarray(0,3) = "Name3"
resultarray(1,3) = "Score3"
resultarray(2,3) = "Category3"
resultarray(3,3) = "OtherDetail3"
resultarray(4,3) = "OtherDetail3"
resultarray(5,3) = "OtherDetail3"

The array has to be sorted based on the second column , ie Score. The number of rows would be around a few millions. Score will always be a positive integer (will require two digit decimals in near future). Speed is very important , as this has to be done for a range of few ten-thousands to million numbers , for 30 - 40 different groups.

Presently using Quicksort exactly from :

https://web.archive.org/web/20210125130007/http://www.4guysfromrolla.com/webtech/012799-3.shtml

I have interchanged row <-> column in my implementation , then this works fine. But slow. Is it worth changing the sorting technique from this existing QuickSort.

I intend to use Binary search later for searching around 2000 elements based on the Score match.

Thanks

arcotenterprises
  • 131
  • 1
  • 4
  • 14

3 Answers3

0

What about something like this (using the maximum score as the upper dimension on the sortedarray):

Dim sortedarray(5, MAXIMUM_SCORE_GOES_HERE)
for i=LBound(resultarray) to UBound(resultarray)
    idxTarget = resultarray(1,i);
    sortedarray(0,idxTarget) = resultarray(0,i)
    sortedarray(1,idxTarget) = resultarray(1,i)
    sortedarray(2,idxTarget) = resultarray(2,i)
    sortedarray(3,idxTarget) = resultarray(3,i)
    sortedarray(4,idxTarget) = resultarray(4,i)
    sortedarray(5,idxTarget) = resultarray(5,i)
Next

It's Radix sort, without the radix part. It doesn't get any faster than this. I unrolled the "inner loop" as you have, but it might be faster with an inner loop in place: rather than indexing the first dimension, try looping through it, it might actually be faster. Also you might want to try For Each...Next instead of the for loop written above: sometimes For Each is faster. But as far as the sorting algorithm, faster is not possible.

angelatlarge
  • 4,086
  • 2
  • 19
  • 36
  • another issue , I am facing now is with "Out of stack space". Dont think it is the number of records , ( around 250,000 only). – arcotenterprises Mar 11 '13 at 05:15
  • Is it doing that during the sort? On what line? – angelatlarge Mar 11 '13 at 05:55
  • '== Recursively call function .. the beauty of Quicksort '== 2 or more items in first section if loBound < (hiSwap - 1) then Call QuickSort(vec,loBound,hiSwap-1,SortField) '== 2 or more items in second section – arcotenterprises Mar 11 '13 at 06:00
  • trying out with different set of data , it worked for one set with 450,000 rows. Is it the number of comparisons , that makes the difference ? How to go about this. – arcotenterprises Mar 11 '13 at 06:02
  • Well, quicksort *is* recursive, so that makes sense. If the pivot is chosen perfectly, you will have fewer recursive calls, but I thought you wanted to do better than quicksort. – angelatlarge Mar 11 '13 at 06:30
  • being recursive out of stack is expected. But , 1. How come it does not come for 450,000 data , 700,000 data - only for that one particular set. Also dont thinkfew hundred thousand is really a big number. 2. I want to do better than quick sort. If you can let me send you my code file and sample data , may be you can see where I am going wrong and advise in a general manner. I will be highly obliged. Thanks. – arcotenterprises Mar 11 '13 at 06:44
  • Depending on the implementation, Quicksort may be non-deterministic. Does your quicksort implementation perform a random shuffle first? That's one source of randomness. The other one is the pivot picking algorithm. If you get unlucky on those counts you _could_ (in principle) have n^2 performance, which will take n recursive calls, instead of log(n) recursive calls. In practice you may get anything in between. – angelatlarge Mar 11 '13 at 06:49
0

I don't know how flexible you are to change and bend your data, but you could use an ArrayList and perform the Sort method on it. This is a proof of concept:

option explicit
' Create an arraylist to add items to
Dim i, score, t
dim list : Set list = CreateObject("System.Collections.ArrayList")

for i = 0 to 1000000
        ' Make an arbitrairy score between 0 and 100
        score = cint(rnd*100)
        ' pad with zero's to make the sort work correctly
        score = string(3 - len(score), "0") & score
        ' Add it to the arraylist
        list.add join(array(score, "name", "category", "details1", "details2", "details3"), vbTab)
Next

' Sort the list
t = timer()
list.sort

' Show the list
wscript.echo "duration: " & timer() - t
wscript.echo join(list.toArray(), vbNewLine)

This returns a duration of 2.6 second to sort 1.000.000 items on my machine (Intel i5).

As said before, you cannot use it directly on your data format, but if performance is an important requirement it could be worth to change your data model.

AutomatedChaos
  • 7,267
  • 2
  • 27
  • 47
  • for addition food for thought (especially SortedList), see http://stackoverflow.com/q/14563678/603855 – Ekkehard.Horner Mar 11 '13 at 16:01
  • @Ekkehard.Horner thought about that, but the key list (that would be score) must be unique, while same scores can be presented. I could not get sortedlist working for other values than longs in the key list, but maybe I did something wrong. Otherwise it would be nice because you can add an homebrew `ScoreItem` object in the value entry. – AutomatedChaos Mar 11 '13 at 18:49
0

To illustrate why

  1. you shouldn't duplicate the data from the original array in the ArrayList, but use/store an index
  2. a SortedList is appropriate for a problem of grouping (not sorting)

Proof of concept script:

  Dim nRecs : nRecs = 100
  ReDim aOrg(1, nRecs)
  Dim i
  For i = 1 To nRecs ' aOrg[0] intentinally wasted
      aOrg(0, i) = "Item " &  i
      aOrg(1, i) = IRandR(5, 16)
  Next
  Dim slX : Set slX = CreateObject("System.Collections.SortedList")
  For i = 1 To nRecs
      If Not slX.Contains(aOrg(1, i)) Then Set slX(aOrg(1, i)) = CreateObject("System.Collections.ArrayList")
      slX(aOrg(1, i)).Add i
  Next
  For i = 0 To slX.Count - 1
      WScript.Echo "---- #", i, "score:", slX.GetKey(i)
      WScript.Echo vbTab, Join(slX.GetByIndex(i).ToArray())
      WScript.Echo vbTab, Join(GetCargo(aOrg, slX.GetByIndex(i)))
  Next

Function GetCargo(aO, aI)
  ReDim aTmp(aI.Count - 1)
  Dim i
  For i = 0 To UBound(aTmp)
      aTmp(i) = aO(0, aI(i))
  Next
  GetCargo = aTmp
End Function

output:

---- # 0 score: 5
         7 11 18 23 44 50 69 85 96 99
         Item 7 Item 11 Item 18 Item 23 Item 44 Item 50 Item 69 Item 85 Item 96 Item 99
---- # 1 score: 6
         41 46 47 58 59 80
         Item 41 Item 46 Item 47 Item 58 Item 59 Item 80
---- # 2 score: 7
         29 36 39 66 67 72 95 97
         Item 29 Item 36 Item 39 Item 66 Item 67 Item 72 Item 95 Item 97
---- # 3 score: 8
         4 5 26 30 49 51 53 57 64 75 93
         Item 4 Item 5 Item 26 Item 30 Item 49 Item 51 Item 53 Item 57 Item 64 Item 75 Item 93
---- # 4 score: 9
         12 15 20 52 56 61 62 74 79 88 94 100
         Item 12 Item 15 Item 20 Item 52 Item 56 Item 61 Item 62 Item 74 Item 79 Item 88 Item 94 Item 100
---- # 5 score: 10
         2 21 25 40 70 86 90 91 92
         Item 2 Item 21 Item 25 Item 40 Item 70 Item 86 Item 90 Item 91 Item 92
---- # 6 score: 11
         3 24 27 33 45 65 68 77 78 81
         Item 3 Item 24 Item 27 Item 33 Item 45 Item 65 Item 68 Item 77 Item 78 Item 81
---- # 7 score: 12
         1 10 28 37 43 60 63 82 89
         Item 1 Item 10 Item 28 Item 37 Item 43 Item 60 Item 63 Item 82 Item 89
---- # 8 score: 13
         6 8 9 14 22 48 73
         Item 6 Item 8 Item 9 Item 14 Item 22 Item 48 Item 73
---- # 9 score: 14
         13 17 31 32 71 84
         Item 13 Item 17 Item 31 Item 32 Item 71 Item 84
---- # 10 score: 15
         16 19 34 35 38 42 54 55 76 83 87 98
         Item 16 Item 19 Item 34 Item 35 Item 38 Item 42 Item 54 Item 55 Item 76 Item 83 Item 87 Item 98
Ekkehard.Horner
  • 38,498
  • 2
  • 45
  • 96