1

I am trying to implement a set of encode and decode steganography functions in C, where i use rand() to randomly scatter my data along an array.

I use rand to calculate a random index like so:

unsigned int getRandIndex(int size) {
  return ((unsigned)(rand() * 8)) % size;
}

I seed rand like so:

unsigned long seed = time(NULL);
srand(seed);

I include the seed along with my data as a part of a header that contains a checksum and length.

The problem i have is that while decoding, when i seed the rand function again with the seed i decoded from the data, rand() tends to produce the slightest variations like so:

Index at encode:                  | Index at decode:
---------------------------------------------------------------------
.
.
.
At:                        142568 | At:                        142568
At:                        155560 | At:                        155552
                               --                                  --
At:                        168184 | At:                        168184
.
.
.

Messing up my decoded data.

Is this a limitation of the rand() function? I am 100% sure that the seed is being decoded correctly bit-for-bit as i have verified that.

Faraz
  • 433
  • 3
  • 7
  • 2
    What makes you think `unsigned long seed = time(NULL);` produces the ***same*** seed each time? Are you frozen in time? – Adrian Mole Feb 02 '21 at 17:55
  • It's probably not your problem, but `rand() * 8` is likely to exhibit undefined behavior because of integer overflow. – Paul Hankin Feb 02 '21 at 17:56
  • 1
    @AdrianMole He said he saves the seed with his data, he doesn't use `time(NULL)` when decoding. – Barmar Feb 02 '21 at 17:56
  • @AdrianMole I want a different seed every time encode is ran. the seed will be included with the data to be encoded – Faraz Feb 02 '21 at 17:57
  • 2
    It's very unlikely that rand(), even if it were seeded wrongly, would produce only slightly different results. More likely the problem is elsewhere. – Paul Hankin Feb 02 '21 at 17:57
  • 1
    The C Standard dictates that the same seed produces the same sequence - on any given system, that is. – Adrian Mole Feb 02 '21 at 17:58
  • @PaulHankin I need it to be a multiple of 8, as i am going to encode 8 bit chunks starting at the calculated index, is there a better way? Currently i am handling the overflow by casting it to an unsigned. Also, it works 90% of the time with the same data, so it cannot be that is causing it, can it? – Faraz Feb 02 '21 at 17:59
  • 2
    Someone who compiles your program on a different system will likely get a different sequence from the saved seed, so you may need to rethink this approach. See also [this question](https://stackoverflow.com/questions/65980429/same-srand-seeds-produces-different-values-on-different-computers) from two days ago. – Steve Summit Feb 02 '21 at 18:02
  • There's lots of ways, maybe `((unsigned)rand() * 8) % size` or `rand() % (size/8) * 8` or `rand() % size & ~7`. Overflow of signed integer is undefined behavior. – Paul Hankin Feb 02 '21 at 18:04
  • While it is unlikely that this behaviour you are seeing would be caused by `rand`, I would **never** use `rand` for this purpose. The inner workings of `rand` are not specified, and it is guaranteed to give the same numbers in the same system... under conditions, but given the fact that even C standard libraries are dynamically linked into the executable you could soon find out that decoding wouldn't even work on the *same* system using the *same* compiled executable. – Antti Haapala -- Слава Україні Feb 02 '21 at 18:04
  • @SteveSummit wow i hadn't thought of that. – Faraz Feb 02 '21 at 18:05
  • @AnttiHaapala i can be 100% certain that the encode and decode functions will be run on the same machine, so would it still matter if i used rand()? – Faraz Feb 02 '21 at 18:07
  • @FarazShaikh that's what my last sentence says. What you should do is use your *own PRNG* for which you can make the guarantee. – Antti Haapala -- Слава Україні Feb 02 '21 at 18:11
  • 1
    @AdrianMole: Re “The C Standard dictates that the same seed produces the same sequence - on any given system, that is”: Are you sure? If I have two different implementations of the C standard library on the same system, does the standard dictate they produce the same sequence? – Eric Postpischil Feb 02 '21 at 18:18
  • You shouldn't generate a random index like this, for the rare case you get the same one twice. It'd be safer to create an array from 0 to size-1, shuffle that and then draw numbers from there sequentially. – Reti43 Feb 02 '21 at 18:24
  • @Reti43 i account of collisions, essentially generating a new index if a collision occurs, the thing is the size of my data is small enough compared to the size of the array that generally the effect of collisions is negligible – Faraz Feb 02 '21 at 18:26
  • @EricPostpischil Sorry. As I'm sure you know, I should have used *implementation/system combination* instead of just *system*. – Adrian Mole Feb 02 '21 at 18:42
  • Does this answer your question? [How predictable is the result of rand() between individual systems?](https://stackoverflow.com/questions/59600866/how-predictable-is-the-result-of-rand-between-individual-systems) – Peter O. Feb 02 '21 at 19:41

3 Answers3

5

C 2018 7.22.2.2 2 says:

The srand function uses the argument as a seed for a new sequence of pseudo-random numbers to be returned by subsequent calls to rand. If srand is then called with the same seed value, the sequence of pseudo-random numbers shall be repeated.

This does not explicitly say the sequence is the same in different executions of the program, but I take that as understood. It does not, however, extend to different C implementations, including those that result from linking in a different version of the C standard library.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

OpenBSD's implementation of rand ignores all calls to srand and instead provides automatically seeded, cryptographically strong random numbers as if you were using arc4random; this is documented as an intentional deviation from the C standard. This demonstrates that even if your files are always decoded on the exact same computer that produced them, there's no guarantee rand will do what you want.

If you need reproducible sequences of pseudorandom numbers, you should do what industrial grade statistics software does, which is ship your own PRNG and document your choice of algorithm. (See for instance the ?Random help page for R.)

Also, since this is a steganography application, you need to use a cryptographically strong PRNG (also known as a stream cipher); merely statistically strong PRNGs like the Mersenne Twister are not good enough.

zwol
  • 135,547
  • 38
  • 252
  • 361
0

No. For reproducibility using rand (which is not exactly specified and inherently uses global state) is terrible for multiple reasons:

  1. you might use a different compiler/system, which may use a different RNG,

  2. you might use the same compiler, but updated to a new version, which uses a different RNG,

  3. you might use the same compiler, same version, but with an updated libc, that uses a different RNG,

  4. you use the same compiler and library version, but have any other non-deterministic call order of the RNG, including but not limited to:

    a) some other source of randomness,

    b) user input,

    c) reordering of concurrency from run-to-run, or

    d) any of the above in any of the libraries that you use.

orlp
  • 112,504
  • 36
  • 218
  • 315
  • I see, are there any other ways to guarantee reproducibility while also producing "random" indices? – Faraz Feb 02 '21 at 18:09
  • @FarazShaikh Use a RNG that you control and specify (e.g PCG or xoroshiro) using local state only. – orlp Feb 02 '21 at 18:13
  • I will use my own RNG function. Thanks! – Faraz Feb 02 '21 at 18:15
  • @FarazShaikh I hope you mean that you'll choose a well-respected and tested RNG and use that, not that you will just try to make one yourself. – orlp Feb 02 '21 at 18:16
  • @EricPostpischil According to the comments above OP does save the seed that was used and load that seed rather than the current time when reproducing. – orlp Feb 02 '21 at 18:17
  • @orlp do you have any suggestions for a rng function that is 100% deterministic? I am not too sure what to even google to start my search lol – Faraz Feb 02 '21 at 18:20
  • @EricPostpischil yes the seed is saved, the time is captured and saved then the saved value is retrieved and used to reseed the rng. – Faraz Feb 02 '21 at 18:22