6

I badly need a random file generator that generates a truly random, non-compressible dummy files.

I ended up with this delphi code. It works, but it's painfully sloooow

 var
    Buf     : Integer;
    TheFile : TFileStream;
 begin
      TheFile := TFileStream.Create(FileName, fmCreate OR fmOpenReadWrite);
      with TheFile do
      begin
           for i := 0 to FileSize do    // Iterate
           begin
                Buf := Random(999999) * i;
                WriteBuffer(Buf, SizeOf(Buf));
           end;    // for
      end;    // with
 end,

My question is: Is there a fast random file generator that I can use? Both Delphi code and/or commandline tools are acceptable as long as:

  1. I can run it on Windows without manual intervention (I need this for my tests, no intervention is allowed)
  2. It's fast
  3. Files generated is non-compressible (ie. compressing the generated file results in no space saving)

EDIT For those interested, I applied the advice I received here and made this function, it's fast enough & 7zip has hard time compressing the generated data.

TheDude
  • 3,045
  • 4
  • 46
  • 95
  • Profile your code and find out where it is spending the most time. – japreiss Apr 18 '12 at 19:17
  • 1
    He's probably better off using CryptoAPI filling a buffer that is used for writing to the file. There is some C code - http://msdn.microsoft.com/en-us/library/aa382048.aspx that is a good starting point – Anya Shenanigans Apr 18 '12 at 19:18
  • What type is Buf? What type is i? Should the termination be 'FileSize-1'? – Martin James Apr 18 '12 at 19:21
  • Please define "fast" and "slow." – Rob Kennedy Apr 18 '12 at 19:24
  • @Petesh: I tried it, but the data generated was highly compressible (thought I'm pretty sure I missed something) – TheDude Apr 18 '12 at 19:25
  • A Buf type of 'Buf:array [0..4095] of byte;' would be good.. – Martin James Apr 18 '12 at 19:25
  • 5
    @Gdhami - fill up a page-sized buffer, then write it. Writing one int at a time will be slow, use: Buf:array [0..2047] of integer; – Martin James Apr 18 '12 at 19:27
  • @RobKennedy: generating 100 files (5 MB each) takes 40 minutes(!) I've seen tools that could do this in few seconds(although not available as command line tools) – TheDude Apr 18 '12 at 19:32
  • @MartinJames: Good suggestion, I'm trying it now... – TheDude Apr 18 '12 at 19:33
  • @MartinJames: That did it, please post it as an answer so that I can accept it, thanks! – TheDude Apr 18 '12 at 19:43
  • 1
    Allocate a big block of memory. Fill it with random numbers. Then write it as a whole piece. – Andrej Kirejeŭ Apr 18 '12 at 20:00
  • 5
    If ever you have a need to write small chunks at a time and it still to be fast, you can use the code from here: http://stackoverflow.com/questions/5639531/buffered-files-for-faster-disk-access/5639712#5639712 – David Heffernan Apr 18 '12 at 20:18
  • Using Random will never create a true random file. With enough background knowledge people can guess your algorithm use the same seed and predict your sequence. – Pieter B Apr 19 '12 at 07:07
  • @PieterB: As I said, I need this to generate random data for my **private test cases**, it's not meant to be shared with others, so no security implication here – TheDude Apr 19 '12 at 07:32
  • 1
    @TheDude You should provide [your code](https://pastebin.com/SHwPBFZB) as an answer to your question! (I nearly missed it ;-) – yonojoy Jan 05 '18 at 09:07

2 Answers2

9

Use a 4096-byte page-size, or multiple page-size, buffer. Writing one integer at a time will be slow.

Martin James
  • 24,453
  • 3
  • 36
  • 60
1

You can use my generate_random_file.py script (Python 3) that I used to generate test data in a project of mine.

  • It works both on Linux and Windows.
  • It is very fast, because it uses os.urandom() to generate the random data in chunks of 256 KiB instead of generating and writing each byte separately.
robert
  • 3,484
  • 3
  • 29
  • 38
  • @Will: How dare you call this spam? Yes, I have posted this answer in 3 related questions, but this is generally considered to be acceptable (http://meta.stackexchange.com/questions/17455/is-it-ok-that-i-just-posted-my-same-answer-to-several-related-questions). If you want me to improve my answer, then tell me why. Simply deleting it is bad practice. – robert Feb 05 '13 at 14:20