Checksum for an integer array?

Question

I have an array that is of size 4,9,16 or 25 (according to the input) and the numbers in the array are the same but less by one (if the array size is 9 then the biggest element in the array would be 8) the numbers start with 0 and I would like to do some algorithm to generate some sort of a checksum for the array so that I can compare that 2 arrays are equal without looping through the whole array and checking each element one by one.

Where can I get this sort of information? I need something that is as simple as possible. Thank you.

edit: just to be clear on what I want:

-All the numbers in the array are distinct, so [0,1,1,2] is not valid because there is a repeated element (1)

-The position of the numbers matter, so [0,1,2,3] is not the same as [3,2,1,0]

-The array will contain the number 0, so this should also be taken into consideration.

EDIT:

Okay I tried to implement the Fletcher's algorithm here: http://en.wikipedia.org/wiki/Fletcher%27s_checksum#Straightforward

int fletcher(int array[], int size){
  int i;
  int sum1=0;
  int sum2=0;
  for(i=0;i<size;i++){
    sum1=(sum1+array[i])%255;
    sum2=(sum2+sum1)%255;
  }
  return (sum2 << 8) | sum1;
}

to be honest I have no idea what does the return line do but unfortunately, the algorithm does not work. For arrays [2,1,3,0] and [1,3,2,0] I get the same checksum.

EDIT2:

okay here's another one, the Adler checksum http://en.wikipedia.org/wiki/Adler-32#Example_implementation

#define MOD 65521;

unsigned long adler(int array[], int size){
  int i;
  unsigned long a=1;
  unsigned long b=0;
  for(i=0;i<size;i++){
    a=(a+array[i])%MOD;
    b=(b+a)%MOD;
  }
  return (b <<16) | a;
}

This also does not work. Arrays [2,0,3,1] and [1,3,0,2] generate same checksum. I'm losing hope here, any ideas?

The numbers in an array are not unique, I guess?! So { 1,2,2,4 } is valid? — ur., Jun 05 '12 at 06:18
> the numbers in the array are the same Can you elaborate on that? — Nick, Jun 05 '12 at 06:19
oh sorry I did not mention! Yes the numbers are unique, so [1,2,2,4] is NOT valid. — MinaHany, Jun 05 '12 at 06:21
{1,2,3,4} and {4,3,2,1}. Are these 2 arrays equal according to your definition? — Pavan Manjunath, Jun 05 '12 at 06:29
So, basically, if your array size is 4, then some examples of possible elements are: {1,2,3,4}, {1,3,4,2}, {1,4,3,2} {4,3,2,1}.. etc But, the always the elements will be between 1 and 4. Is this right? — Jay, Jun 05 '12 at 06:37
Yes, that's right! But I just realised that the array will contain the number 0 so possiblme elements are : [0,1,2,3], [0,2,3,1], [0,3,2,1] and so on — MinaHany, Jun 05 '12 at 06:42
Sounds very much like some sort of Sudoku generator or solver. There is like plenty of code for that to be found on the web. — Lundin, Jun 05 '12 at 06:50
Also, your spec seems incorrect. If the array of size 9 contains distinct numbers between 0 and 9, then 10 different numbers cannot fit in 9 array positions. One number between 0 and 9 will be left out in each array. Is that the case? — Lundin, Jun 05 '12 at 06:51
If you are looking for a simple solution, then that is to compare the arrays one element at a time. For an array of size 9, with ten possible values per item, the chance of finishing the comparison at the first attempt is 9 out of 10. If the first items are identical, the chance that the next comparison will be the last one, is 8 out of 9. And so on. This would be quite an effective algorithm in terms of comparisons (if not in terms of branch prediction). — Lundin, Jun 05 '12 at 07:01
@Lundin no, I meant the biggest element would be 8 not 9 (elements are from 0 to size-1 — MinaHany, Jun 05 '12 at 07:09
@MinaHany Still, I doubt you will be able to write more effective code than the simple for loop. A checksum calculation with modulo, division etc like those you posted probably takes 100 times the execution time than the worst case for loop (24 identical elements). So the real question would be if this checksum makes any sense in the particular application. — Lundin, Jun 05 '12 at 13:02

Nicolas Repiquet · Accepted Answer · 2012-06-05T10:22:57.153

5

Let's take the case of your array of 25 integers. You explain that it can contains any permutations of the unique integers 0 to 24. According to this page, there is 25! (25 factorial) possible permutations, that is 15511210043330985984000000. Far more than a 32bit integer can contains.

The conclusion is that you will have collision, no matter how hard you try.

Now, here is a simple algorithm that account for position:

int checksum(int[] array, int size) {
  int c = 0;
  for(int i = 0; i < size; i++) {
    c += array[i];
    c = c << 3 | c >> (32 - 3); // rotate a little
    c ^= 0xFFFFFFFF; // invert just for fun
  }
  return c;
}

edited Jun 05 '12 at 10:22

answered Jun 05 '12 at 10:17

Nicolas Repiquet

9,097
2
31
53

I disagree with the claim that collision is unavoidable. @UmNyobe's solution, though not efficient enough for this particular case, guarantees 0% collision. Saying something's *very difficult* is quite different from saying that something's *not possible* at all :) – Pavan Manjunath Jun 05 '12 at 10:42
I'm sorry but this doesn't work either; 7 out of 12 arrays I am testing generate the same checksum. – MinaHany Jun 05 '12 at 11:30
3

@PavanManjunath I'm not saying that it's difficult, I say it's impossible. You **can't** have 15511210043330985984000000 different 32bit integers, so you **can't** have a unique checksum per possible array, so you **will** have collision. – Nicolas Repiquet Jun 05 '12 at 12:08
According to @UmNyobe's solution which uses index as the checksum, you need not have 15511210043330985984000000 different 32 bit integers in your program. You just need to calculate the indices for the 2 arrays that the user wishes to compare. These indices can go upto 15511210043330985984000000 , I agree. So we need to have only 2 variables that can hold this big a value. Also note that initially OP started out without any language preference. So Java's BigInteger class can help to hold this big an integer. In short, its not impossible! – Pavan Manjunath Jun 05 '12 at 12:17
@PavanManjunath It's not a checksum anymore though, it's a kind of unique signature. Btw, BigInteger data is stored in an array. So to compare two "checksums", you end up comparing two arrays. Oh wait! Isn't it what we're trying to avoid in the first place ? ;D – Nicolas Repiquet Jun 05 '12 at 12:37

score 1 · Answer 2 · edited May 23 '17 at 12:19

1

I think what you want is in the answer of the following thread:

Fast permutation -> number -> permutation mapping algorithms

You just take the number your permutation is mapped to and take that as your Checksum. As there is exactly one Checksum per permutation there can't be a smaller Checksum that is collision free.

edited May 23 '17 at 12:19

Community

1
1

answered Jun 05 '12 at 08:10

Philosophus42

472
3
11

Replacing, e.g., comparisons of 25-element arrays with comparisons of about 12 bytes - a win, once you've amortised the cost of computing the _numbers_ (space being no issue, as the arrays are equivalent and bigger). Depending on the probability of equal checksum values for a considerable number of checks, a fast&small checksum may be slower. E.g., with each permutation equally probable, expect it to be much faster. – greybeard May 15 '15 at 08:47

score 1 · Answer 3 · answered May 15 '15 at 08:21

How about the checksum of weighted sum? Let's take an example for [0,1,2,3]. First pick a seed and limit, let's pick a seed as 7 and limit as 10000007.

a[4] = {0, 1, 2, 3}
limit = 10000007, seed = 7
result = 0
result = ((result + a[0]) * seed) % limit = ((0 + 0) * 7)) % 10000007 = 0
result = ((result + a[1]) * seed) % limit = ((0 + 1) * 7)) % 10000007 = 7
result = ((result + a[2]) * seed) % limit = ((7 + 2) * 7)) % 10000007 = 63
result = ((result + a[3]) * seed) % limit = ((63 + 3) * 7)) % 10000007 = 462

Your checksum is 462 for that [0, 1, 2, 3]. The reference is http://www.codeabbey.com/index/wiki/checksum

Yusuf X · Answer 4 · 2012-06-05T07:06:09.377

0

For an array of N unique integers from 1 to N, just adding up the elements will always be N*(N+1)/2. Therefore the only difference is in the ordering. If by "checksum" you imply that you tolerate some collisions, then one way is to sum the differences between consecutive numbers. So for example, the delta checksum for {1,2,3,4} is 1+1+1=3, but the delta checksum for {4,3,2,1} is -1+-1+-1=-3.

No requirements were given for collision rates or computational complexity, but if the above doesn't suit, then I recommend a position dependent checksum

edited Jun 05 '12 at 07:06

answered Jun 05 '12 at 06:39

Yusuf X

14,513
5
35
47

1

Nice and simple logic but it breaks easily- {4,2,3,1} also yields a checksum of `-3` – Pavan Manjunath Jun 05 '12 at 06:42
1

{3,2,1,4} is 1+1+3 = 5 & {3,1,2,4} is 2+1+2 = 5. Both turns out same. So, how will this work? – Jay Jun 05 '12 at 06:42
Sorry, I forgot to mention that the number 0 will also be in the array. – MinaHany Jun 05 '12 at 06:42
Right, that's why I mention "collision". Checksums normally have them. Good checksums have few collisions and are computationally inexpensive. But if you want to guarantee no collisions, then you have to go through each element in the array. – Yusuf X Jun 05 '12 at 06:49
@YusufX agreed that collisions are inevitable when we are looking out for time optimization and simple algorithms, but collision rate shouldn't be too high even for simple cases as is in your logic – Pavan Manjunath Jun 05 '12 at 06:56
1

No requirements were given for collision rates or computational complexity, but if my answer doesn't suit, then I recommend a position-dependent checksum, as described here: http://en.wikipedia.org/wiki/Checksum#Position-dependent_checksums – Yusuf X Jun 05 '12 at 07:01
@YusufX I think you should edit your answer and include the above link. It seems to be on target and useful – Pavan Manjunath Jun 05 '12 at 07:04

score 0 · Answer 5 · edited May 23 '17 at 12:33

From what I understand your array contains a permutation of numbers from 0 to N-1. One check-sum which will be useful is the rank of the array in its lexicographic ordering. What does it means ? Given 0, 1, 2 You have the possible permutations

  1: 0, 1, 2
  2: 0, 2, 1
  3: 1, 0, 2
  4: 1, 2, 0
  5: 2, 0, 1
  6: 2, 1, 0

The check-sum will be the first number, and computed when you create the array. There are solutions proposed in

Find the index of a given permutation in the list of permutations in lexicographic order

which can be helpful, although it seems the best algorithm was of quadratic complexity. To improve it to linear complexity you should cache the values of the factorials before hand.

The advantage? ZERO collision.

EDIT: Computation The value is like the evaluation of a polynomial where factorial is used for the monomial instead of power. So the function is

f(x0,....,xn-1) = X0 * (0!) + X1 * (1!) + X2 * (2!) +...+ Xn-1 * (n-1!)

The idea is to use each values to get a sub-range of permutations, and with enough values you pinpoint an unique permutation.

Now for the implementation (like the one of a polynomial):

pre compute 0!.... to n-1! at the beginning of the program
Each time you set an array you use f(elements) to compute its checksum
you compare in O(1) using this checksum

Thanks for the answer! Would you please explain a bit more on what the calculation is? I can't seem to get what's on the other thread. — MinaHany, Jun 05 '12 at 09:16
Though the idea of using the index as a checksum is a pretty good one ( 0% collision ) but the complexity of calculating the checksum, the computations of factorials etc cant beat the straightforward `array1[i] == array2[i]` O(n) comparison. — Pavan Manjunath, Jun 05 '12 at 10:33
I'm doing something wrong or this doesn't work: arrays [0,1,2,3] and [2,1,0,3] both generate a checksum of 18. — MinaHany, Jun 05 '12 at 11:33

Checksum for an integer array?

5 Answers5