1

backstory

I put together a simple multi-threaded brute-force hash hacking program for a job application test requirement.

Here are some of the particulars

It functions properly, but the performance is quite a bit different between my initial version and this altered portion.

factors

The reason for the alteration was due to increased number of possible combinations between the sample data processing and the test/challenge data processing.

The application test sample was 16^7 total combinations.. which is of course less that uint32 (or 16^8).

the challenge is a 9 length hashed string that produces a hashed long value (that I was given); thus it is 16^9. The size difference was something I accounted for, which is why I took the easy route of putting the initial program together targeting the 7 length Hashed string - getting it to function properly on a smaller scale.

overall The issue isn't just the increased combinations, it is dramatically slower due to the loop operating using long/int64 or uint64..

when I crunched the numbers using int32 (not even uint32) data types.. I could hear my comp kick it up a notch.. The entire check was done in under 4 minutes. that's 16777216 (16^6) combination checks per thread..

noteworthy - multithreading I broke everything into worker threads.. 16 of them, 1 for each of the beginning characters.. thus I'm only looking for 16^8 combination on each thread now... which is 1 freaking unit higher than uint32 value (includes 0)...

I'll give a final thought after I put up this code segment..

The function is as followed:

Function Propogate() As Boolean
    Propogate = False
    Dim combination As System.Text.StringBuilder = New System.Text.StringBuilder(Me.ScopedLetters)

    For I As Integer = 1 To (Me.ResultLength - Me.ScopedLetters.Length) Step 1
        combination.Append(Me.CombinationLetters.Chars(0))
    Next

    'Benchmarking feedback - This simply adds values to a list to be checked against to denote progress
    Dim ProgressPoint As New List(Of Long)


    '###############################
    '#[THIS IS THE POINT OF INTEREST] 
    '# Me.CombinationLetters = 16    #  
    '# Me.ResultLength = 7 Or 9      #  The 7 was the sample size provided.. 9 was the challenge/test
    '# Me.ScopedLetters.Length = 1   #  In my current implementation 
    '###############################
    Dim TotalCombinations As Long = CType(Me.CombinationLetters.Length ^ (Me.ResultLength - Me.ScopedLetters.Length), Long)

    ProgressPoint.Add(1)
    ProgressPoint.Add(CType(TotalCombinations / 5, Long))
    ProgressPoint.Add(CType(TotalCombinations * 2 / 5, Long))
    ProgressPoint.Add(CType(TotalCombinations * 3 / 5, Long))
    ProgressPoint.Add(CType(TotalCombinations * 4 / 5, Long))
    ProgressPoint.Add(CType(TotalCombinations, Long))

    For I As Long = 1 To TotalCombinations Step 1
        Me.AddKeyHash(combination.ToString) 'The hashing arithmetic and Hash value check is done at this call. 
        Utility.UpdatePosition(Me.CombinationLetters, combination) 'does all the incremental character swapping and string manipulation.. 
        If ProgressPoint.Contains(I) Then
            RaiseEvent OnProgress(CType((I / TotalCombinations) * 100, UInteger).ToString & " - " & Me.Name)
        End If
    Next
    Propogate = True
End Function

I already have an idea of what I could try, drop it down the int32 again and put another loop around this loop (16 iterations)

But there might be better alternative, so I would like to hear from the community on this one.

Would a For Loop using double point precision cycle better?


by the way, how coupled is long types and arithmetic to cpu architecture.. specifically cacheing?

My development comp is old.. Pentium D running XP Professional x64 .. my excuse is that if it runs in my environment, it will likely run on Win Server 2003..

Brett Caswell
  • 1,486
  • 1
  • 13
  • 25
  • 2
    The C++ tag is for questions regarding C++, which this question is not about. – Some programmer dude Jul 25 '14 at 12:28
  • well.. I wasn't really intending on adding it.. it was suggested at the bottom of the post.. so I figured the system sought fit to including the C++ folk.. – Brett Caswell Jul 25 '14 at 12:31
  • Don't look at the loop itself. Look at what's inside it. – Mike Dunlavey Jul 25 '14 at 12:32
  • right, but everything inside the loop was nearly exactly the same.. the only difference was the benchmark/progressPoints were turned to Long as well.. which certainly bring the List.contains method into to question concerning the `long` arithmetic.. but that's back to the heart of the matter isn't it.. is it too taxing to do loops and checks using long data types? – Brett Caswell Jul 25 '14 at 12:36
  • Are you running the code in a 64-bit process? – Alireza Jul 25 '14 at 12:44
  • no.. I suspect it's 32-bit.. I have it running at the moment in vshost debugger.. at this point, it's like an hour into it.. and 3 of the treads have reached 20%.. which, again, seems quite slow.. I mean I know the combinations are about 800 million for each of those threads.. but it took about 4 minutes to complete 600 million before the type alteration.. – Brett Caswell Jul 25 '14 at 12:50
  • wait.. I mean 20% is about 800 million.. – Brett Caswell Jul 25 '14 at 12:51
  • Does you CPU have at least 3 cores? If not, then thread is kind of pointless. – the_lotus Jul 25 '14 at 12:53
  • wait.. hmm 16*7 isn't near 600 million.. it's 287 million?.. hmm.. maybe it's not as bad a I thought.. the rev-less comp thing is still a query to me though.. – Brett Caswell Jul 25 '14 at 12:54
  • 1
    Well, what I always do is randomly pause it under the debugger and *find out* what's taking the time. Guessing, or inviting other people to guess, is a great intellectual challenge, but doesn't give you the answer. If you have a guess, [*randomly pausing will tell you if you're right.*](http://stackoverflow.com/a/378024/23771) – Mike Dunlavey Jul 25 '14 at 13:46
  • @MikeDunlavey, that was a pretty interesting reference.. Unfortunately my call stack and parallel stack are extremely uninformative.. the byte offset was the only portion of information that was unknown to me.. neither the stack or parallel stack are providing me with any form of metrics.. I suspect it is because I'm running in a .net managed environment.. My usual method is to implement some system.diagnostic stopwatches.. However, you did put me on the hunt.. I pulled up the Spy++ tool and am using it to view the vhost threads.. thanks for the suggestion. – Brett Caswell Jul 25 '14 at 14:33
  • 1
    @Brett: To see what's going on, it will be doing it threads or no threads, so I would simplify by turning threads off. When it's optimized that way, then I would turn on the threads and look for things like mutex blocking. – Mike Dunlavey Jul 25 '14 at 14:47

1 Answers1

0

In the end, this could have likely been a hardware issue.. my old workstation did not survive much longer after doing this project.

Brett Caswell
  • 1,486
  • 1
  • 13
  • 25