Hash function to iterate through a matrix

Question

Given a NxN matrix and a (row,column) position, what is a method to select a different position in a random (or pseudo-random) order, trying to avoid collisions as much as possible?

For example: consider a 5x5 matrix and start from (1,2)

I'm looking for a method like

(x,y) hash (x,y);

to jump to a different position in the matrix, avoiding collisions as much as possible (do not care how to return two different values, it doesn't matter, just think of an array).

Of course, I can simply use

row = rand()%N;
column = rand()%N;

but it's not that good to avoid collisions.
I thought I could apply twice a simple hash method for both row and column and use the results as new coordinates, but I'm not sure this is a good solution.

Any ideas?

What is your goal? Is it to iterate the matrix in a random order? — svick, Sep 18 '11 at 15:12

score 3 · Answer 1 · edited May 23 '17 at 12:32

3

Can you determine the order of the walk before you start iterating? If your matrices are large, this approach isn't space-efficient, but it is straightforward and collision-free. I would do something like:

Generate an array of all of the coordinates. Remove the starting position from the list.
Shuffle the list (there's sample code for a Fisher-Yates shuffle here)
Use the shuffled list for your walk order.

edited May 23 '17 at 12:32

Community

1
1

answered Sep 18 '11 at 15:35

Seamus Campbell

17,816
3
52
60

The wikipedia page for the [Fisher-Yates shuffle](http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle) shows a way to initialize and shuffle simultaneously, which can be used here – Hasturkun Sep 18 '11 at 15:37

score 2 · Answer 2 · edited Apr 13 '17 at 12:19

Edit 2 & 3: A modular approach: Given s array elements, choose a prime p of form 2+3*n, p>s. For i=1 to p, use cells (iii)%p when that value is in range 1...s-1. (For row-length r, cell #c subscripts are c%r, c/r.)

Effectively, this method uses H(i) = (iii) mod p as a hash function. The reference shows that as i ranges from 1 to p, H(i) takes on each of the values from 0 to p-1, exactly one time each.

For example, with s=25 and p=29 or 47, this uses cells in following order:

p=29: 1 8 6 9 13 24 19 4 14 17 22 18 11 7 12 3 15 10 5 16 20 23 2 21 0
p=47: 1 8 17 14 24 13 15 18 7 4 10 2 6 21 3 22 9 12 11 23 5 19 16 20 0

according to bc code like

s=25;p=29;for(i=1;i<=p;++i){t=(i^3)%p; if(t<s){print " ",t}}

The text above shows the suggestion I made in Edit 2 of my answer. The text below shows my first answer.

Edit 0: (This is the suggestion to which Seamus's comment applied): A simple method to go through a vector in a "random appearing" way is to repeatedly add d (d>1) to an index. This will access all elements if d and s are coprime (where s=vector length). Note, my example below is in terms of a vector; you could do the same thing independently on the other axis of your matrix, with a different delta for it, except a problem mentioned below would occur. Note, "coprime" means that gcd(d,s)=1. If s is variable, you'd need gcd() code. Example: Say s is 10. gcd(s,x) is 1 for x in {1,3,7,9} and is not 1 for x in {2,4,5,6,8,10}. Suppose we choose d=7, and start with i=0. i will take on values 0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, which modulo 10 is 0, 7, 4, 1, 8, 5, 2, 9, 6, 3, 0. Edit 1 & 3: Unfortunately this will have a problem in the two-axis case; for example, if you use d=7 for x axis, and e=3 for y-axis, while the first 21 hits will be distinct, it will then continue repeating the same 21 hits. To address this, treat the whole matrix as a vector, use d with gcd(d,s)=1, and convert cell numbers to subscripts as above.

+1 for a useful addition here, but it's worth noting that the output is not especially random looking. You're basically identifing a particular move and repeating it, wrapping around the edges as needed. In this case, it's three down and three to the left, which I think will be pretty identifiable as a pattern after just a couple iterations. — Seamus Campbell, Sep 18 '11 at 16:49
With wraparound, I think it would take at least three or four iterations to be obvious. :) Anyway, I've added Edit 2 with a modular method that presents a far less obvious pattern, except for the first two or three cells used. — James Waldby - jwpat7, Sep 18 '11 at 17:46
Nice solution. What is `n` in the modular approach you provided? Besides, if `s` is big, it may be not that easy to find a prime with such a form, and it seems to be the only parameter that changes the sequence... — Manlio, Sep 19 '11 at 11:32
`n` is any odd number large enough that `2+3*n >= s`. I have posted a separate answer below that shows a half-dozen lines of code for finding such a prime. For any array you can display on screen it will take less than 10 microseconds on most PC's. — James Waldby - jwpat7, Sep 20 '11 at 17:00

score 1 · Answer 3 · answered Sep 18 '11 at 15:04

1

If you just want to iterate through the matrix, what is wrong with row++; if (row == N) {row = 0; column++}?

If you iterate through the row and the column independently, and each cycles back to the beginning after N steps, then the (row, column) pair will interate through only N of the N^2 cells of the matrix.

If you want to iterate through all of the cells of the matrix in pseudo-random order, you could look at questions here on random permutations.

answered Sep 18 '11 at 15:04

mcdowella

19,301
2
19
25

I don't want to simply iterate through a matrix, I need to jump to different positions randomly. I'm sorry, my question seems not to be clear. I've edited it adding an example. – Manlio Sep 18 '11 at 15:27
In that case, since you can translate between a (row, column) pair and a number between 1..N^2, you are asking for a random permutation of the numbers 1..N^2, as described in more detail by Seamus Campbell. – mcdowella Sep 18 '11 at 17:03

James Waldby - jwpat7 · Answer 4 · 2011-09-20T17:17:11.870

This is a companion answer to address a question about my previous answer: How to find an appropriate prime p >= s (where s = the number of matrix elements) to use in the hash function H(i) = (i*i*i) mod p.

We need to find a prime of form 3n+2, where n is any odd integer such that 3*n+2 >= s. Note that n odd gives 3n+2 = 3(2k+1)+2 = 6k+5 where k need not be odd. In the example code below, p = 5+6*(s/6); initializes p to be a number of form 6k+5, and p += 6; maintains p in this form.

The code below shows that half-a-dozen lines of code are enough for the calculation. Timings are shown after the code, which is reasonably fast: 12 us at s=half a million, 200 us at s=half a billion, where us denotes microseconds.

// timing how long to find primes of form 2+3*n by division
// jiw 20 Sep 2011
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
double ttime(double base) {
  struct timeval tod;
  gettimeofday(&tod, NULL);
  return tod.tv_sec + tod.tv_usec/1e6 - base;
}
int main(int argc, char *argv[]) {
  int d, s, p, par=0;
  double t0=ttime(0);
  ++par; s=5000;  if (argc > par) s = atoi(argv[par]);
  p = 5+6*(s/6);
  while (1) {
    for (d=3; d*d<p; d+=2)
      if (p%d==0) break;
    if (d*d >= p) break;
    p += 6;
  }
  printf ("p = %d after %.6f seconds\n", p, ttime(t0));
  return 0;
}

Timing results on 2.5GHz Athlon 5200+:

qili ~/px > for i in 0 00 000 0000 00000 000000; do ./divide-timing 500$i; done
p = 5003 after 0.000008 seconds
p = 50021 after 0.000010 seconds
p = 500009 after 0.000012 seconds
p = 5000081 after 0.000031 seconds
p = 50000021 after 0.000072 seconds
p = 500000003 after 0.000200 seconds

qili ~/px > factor 5003 50021 500009 5000081 50000021 500000003
5003: 5003
50021: 50021
500009: 500009
5000081: 5000081
50000021: 50000021
500000003: 500000003

Update 1 Of course, timing is not determinate (ie, can vary substantially depending on the value of s, other processes on machine, etc); for example:

qili ~/px > time for i in 000 004 010 058 070 094 100 118 184; do ./divide-timing 500000$i; done
p = 500000003 after 0.000201 seconds
p = 500000009 after 0.000201 seconds
p = 500000057 after 0.000235 seconds
p = 500000069 after 0.000394 seconds
p = 500000093 after 0.000200 seconds
p = 500000099 after 0.000201 seconds
p = 500000117 after 0.000201 seconds
p = 500000183 after 0.000211 seconds
p = 500000201 after 0.000223 seconds
real 0m0.011s
user 0m0.002s
sys  0m0.004s

score 0 · Answer 5 · answered Sep 18 '11 at 15:31

Consider using a double hash function to get a better distribution inside the matrix, but given that you cannot avoid colisions, what I suggest is to use an array of sentinels and mark the positions you visit, this way you are sure you get to visit a cell once.

Hash function to iterate through a matrix

5 Answers5