16

I am trying to reverse engineer an algorithm used to generate a check digit.

Numbers are 8 digits long and the last digit is the check digit. I have thousands of valid numbers to test it on.

I have tried a standard Luhn, Verhoeff and modulo-10 algorithms (brute force checking of all possible weights), but could not find an answer!

Is it possible to calculate this? Any ideas?

Here is some examples of valid numbers:

1002784-5
1000514-7
1001602-8
1001255-2
1001707-1
1003355-5
1005579-1
1004535-0
1004273-1
1001695-9
1004565-9
1000541-9
1001291-1
1005866-1
1004352-7

EDIT: Thanks guys - I don't have access to the code unfortunately. The number is a tax number, I need to be able to verify that the number was typed in correctly. From my research is looks like most countries use a pretty standard modulo-10 type system. I've got access to about 60 000 numbers.

I understand that the problem could be impossible to solve, it was more of academic concern.

Neil
  • 227
  • 2
  • 8
  • 5
    You should compare 2 numbers whose "distance" (number of difference) is 1, if possible, find as many pairs as possible. You can also look at numbers with the same checksum and see if you can find out anything about it. – nhahtdh Nov 21 '12 at 14:49
  • 4
    Can you feed arbitrary numbers into the check digit algorithm? E.g. if you can observe what the check digit is for "1", then "2", etc. then it should be fairly easy to find the pattern. – j_random_hacker Nov 21 '12 at 14:50
  • 1
    Another idea: Do you know the form of the check digit calculation? If it is simple, like a polynomial in the digits that has unknown coefficients, then you should be able to use Gauss-Jordan elimination to find the coefficients (provided you have more input-output pairs than there are terms in the polynomial -- and it sounds like that's the case). – j_random_hacker Nov 21 '12 at 14:53
  • 1
    Do you have access to the code that generates the sums? – Igor Skochinsky Nov 21 '12 at 14:57
  • 2
    I dont think so Igor, otherwise his reverse engineering has no point. – UmNyobe Nov 21 '12 at 14:58
  • 1
    I also tested with bruteforce, any formula in the form: `(w1d1^p1 + w2d2^p2 + ... + w7d7^p7 + w8) % 10` where `0 < wi < 10` and `0 < pi < 9` but none of them could match this set of data. – Shahbaz Nov 21 '12 at 16:45
  • 1
    @UmNyobe Not necessarily - I had to do the same once for some mainframe code which was a mix of PL/1 + assembler. Given the limited timeframe, it was easier to decipher the check digit algorithm than try and walk the code. – Robbie Dee Nov 21 '12 at 16:50
  • @UmNyobe, or there could be binary code, which can be disassembled. – Shahbaz Nov 21 '12 at 17:26
  • 1
    Can you elaborate on the domain? – Mikko L Nov 22 '12 at 13:04
  • I vote we close this. Unless it is a recognised algorithm we stand little chance of deciphering it. This question is also a dupe: http://stackoverflow.com/questions/2332075/given-a-number-series-finding-the-check-digit-algorithm – Robbie Dee Nov 23 '12 at 13:17

2 Answers2

2

First check your context:

If context is credit cards, driver's licenses, government licensing numbers (not SSN) think Luhn or Mod 10. If some other industry, does that industry have a defacto standard? If not, is the developer of the system using the numbers also a player in an industry that has a de facto standard?

Nobody likes to reinvent the wheel if they don't have to.

If that doesn't help keep in mind:

Don't assume that all the numbers in the keys you are testing against are used to arrive at the check digit. It's possible only 4 or the 8 digits are being used to calculate the check digit (or any other combination). It's also possible there is some external PREFIX number that is used with the other digits to arrive at the check digit. So... line up all your numbers with the same check digit, and see what the similarities are. Can you add a number to them and then always reach check digit? Can you test only the first few digits? Last few digits? every other digit?

Good luck.

  • 2
    If you allow your weighted sum to have zero weights and add a constant to it also, then at least the mod 10 algorithm has been tested (with brute-force) and didn't work. – Shahbaz Nov 23 '12 at 09:55
0

Count how many times in your data (60 thousand) there are digits 0,1,2,3,4,5,6,7,8,9 as a check digit. If the digit 0 occurs twice as often as other digits, it means that the algorithm uses the modulo 11 operation. In this algorithm, if the sum mod 11 = 10, then the check digit is 0.

Romek
  • 1