4

I would like to generate millions of passwords randomly between 4 millions and 50 millions. The problem is the time it takes the processor to process it.

I would like to know if there is a solution to generate a lot of passwords in only a few seconds (max 1 minute for 50 millions).

I've done that for now but it takes me more than 3 min (with a very good config and I would like to run it on small config).

private final static String policy = "azertyuiopqsdfghjklmwxcvbnAZERTYUIOPQSDFGHJKLMWXCVBN1234567890";
    private static List<String> names = new ArrayList<String>();
    
    
    public static void main(String[] args) {
        names.add("de");
        init();
    }
    
    
    
    private static String generator(){
        String password="";
        int randomWithMathRandom = (int) ((Math.random() * ( - 6)) + 6);
        for(var i=0;i<8;i++){
            randomWithMathRandom = (int) ((Math.random() * ( - 6)) + 6);
            password+= policy.charAt(randomWithMathRandom);
        }
        return password;
    }
    
    public static void init() {
        for (int i = 0; i < 40000000; i++) {
            names.add(generator());     
        }
    }

btw I can't take a ready-made list. I think the most 'expensive' waste of time is the input into the list.

My current config : ryzen 7 4800h rtx 2600 SSD NVME RAM 3200MHZ

UPDATE : I tried with 20Millions and it's display an error: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"

Forcela8
  • 61
  • 6
  • for such numbers, you might need to call your OS functions, which will do this _much_ faster then java. – Eugene Apr 01 '21 at 17:33
  • "*The problem is the time it takes the processor to process it.*" - Randomness **is** expensive, even if if is only a PRNG. --- "*I would like to know if there is a solution to generate a lot of passwords in only a few seconds*" - Probably yes. But I would imagine they are not very high quality. – Turing85 Apr 01 '21 at 17:33
  • 1
    Just pick a few million UUIDs. – luk2302 Apr 01 '21 at 17:33
  • The basic suggestion I'm feeling to give is: profile the code and see which step takes longer. – watery Apr 01 '21 at 17:36
  • For 40 millions, it takes only 18 seconds for me , can your [edit] your post and add the config of your laptop ? – azro Apr 01 '21 at 17:37
  • Indeed it really does not take much time for me neither, however try using a StringBuilder already, instead of concatenating strings in a loop. – Yassin Hajaj Apr 01 '21 at 17:38
  • @azro this is most probably related to the amount of heap and JDK version. – Eugene Apr 01 '21 at 17:40
  • Why do you generate random numbers in the interval `[0;6[` only ? – azro Apr 01 '21 at 17:41
  • I just ran a small test. Please make sure you are not blocked by memory constraints. My system hat 5 GB of free RAM, which was sufficient to store ~`20_000_000` 32-byte passwords in memory before my machine started swapping and performance degraded. I was able to generate `10_000_000` 128-byte passwords in 19 seconds. – Turing85 Apr 01 '21 at 17:48
  • The only problem, that i need to run it to any pc so to not 'touch' the limitation of the heap... :'( – Forcela8 Apr 01 '21 at 17:50
  • Wlog. it is not possible to store all passwords in memory, and I/O is slow. So... yeah. Pick your poison I guess. – Turing85 Apr 01 '21 at 17:54
  • That's for a cyber security project. I know it's possible to do it with C++ but that not my 'main' langage – Forcela8 Apr 01 '21 at 17:57
  • Why not store the data in an indexed form? For example, you could encode the 50 million passwords in image form, or use an RNG seed that always puts out the same passwords in the same order. Lastly, it really depends on _why_ you're generating 50 million of these to begin with. – Rogue Apr 01 '21 at 18:18
  • Why can't you use a ready made list? What's the difference with getting them one at a time from a file rather than a data structure? Or why can't you use each password as it's generated so you don't need to store it? – WJS Apr 01 '21 at 19:42
  • This sounds like an XY problem to me. What is it you want to do with the passwords? I would think that the best solution to your problem would depend on the answer to that question. If consuming the passwords is going to take significantly longer than creating them, then the time needed to create each one shouldn't matter. – CryptoFool Apr 01 '21 at 20:28

2 Answers2

4

Storing 50 million passwords as Strings in-memory could cause problems since either the stack or the heap may overflow. From this point of view, I think the best we can do is to generate a chunk of passwords, store them in a file, generate the next chunk, append them to the file... until the desired amount of passwords is created. I hacked together a small program that generates random Strings of length 32. As alphabet, I used all ASCII-characters between '!' (ASCII-value 33) and '~' (ASCII-value 126).

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.text.DecimalFormat;
import java.text.DecimalFormatSymbols;
import java.util.Random;
import java.util.concurrent.TimeUnit;

class Scratch {
  private static final int MIN = '!';
  private static final int MAX = '~';
  private static final Random RANDOM = new Random();
  
  public static void main(final String... args) throws IOException {
    final Path passwordFile = Path.of("passwords.txt");
    if (!Files.exists(passwordFile)) {
      Files.createFile(passwordFile);
    }
    final DecimalFormat df = new DecimalFormat();
    final DecimalFormatSymbols ds = df.getDecimalFormatSymbols();
    ds.setGroupingSeparator('_');
    df.setDecimalFormatSymbols(ds);
    final int numberOfPasswordsToGenerate = 50_000_000;
    final int chunkSize = 1_000_000;
    final int passwordLength = 32;
    int generated = 0;
    int chunk = 0;
    final long start = System.nanoTime();
    while (generated < numberOfPasswordsToGenerate) {
      final StringBuilder passwords = new StringBuilder();
      for (
          int index = chunk * chunkSize;
          index < (chunk + 1) * chunkSize && index < numberOfPasswordsToGenerate;
          ++index) {
        final StringBuilder password = new StringBuilder();
        for (int character = 0; character < passwordLength; ++character) {
          password.append(fetchRandomLetterFromAlphabet());
        }
        passwords.append(password.toString()).append(System.lineSeparator());
        ++generated;
        if (generated % 500_000 == 0) {
          System.out.printf(
              "%s / %s%n",
              df.format(generated),
              df.format(numberOfPasswordsToGenerate));
        }
      }
      ++chunk;
      Files.writeString(passwordFile, passwords.toString(), StandardOpenOption.APPEND);
    }
    final long consumed = System.nanoTime() - start;
    System.out.printf("Done. Took %d seconds%n", TimeUnit.NANOSECONDS.toSeconds(consumed));
  }

  private static char fetchRandomLetterFromAlphabet() {
    return (char) (RANDOM.nextInt(MAX - MIN + 1) + MIN);
  }
}

On my laptop, the program yields good results. It completes in about 33 seconds and all passwords are stored in a single file.

This program is a proof of concept and not production-ready. For example, if a password.txt does already exist, the content will be appended. For me, the file already has 1.7 GB after one run, so be aware of this. Furthermore, the generated passwords are temporarily stored in a StringBuilder, which may present a security risk since a StringBuilder cannot be cleared (i.e. its internal memory structured cannot be zeroed). Performance could further be improved by running the password generation multi-threaded, but I will leave this as an exercise to the reader.

To use the alphabet presented in the question, we can remove static fields MIN and MAX, define one new static field private static final char[] ALPHABET = "azertyuiopqsdfghjklmwxcvbnAZERTYUIOPQSDFGHJKLMWXCVBN1234567890".toCharArray(); and re-implement fetchRandomLetterFromAlphabet as:

  private static char fetchRandomLetterFromAlphabet() {
    return ALPHABET[RANDOM.nextInt(ALPHABET.length)];
  }

We can use the following code-snippet to read-back the n-th (starting at 0) password from the file in constant time:

final int n = ...;
final RandomAccessFile raf = new RandomAccessFile(passwordFile.toString(), "r");
final long start = System.nanoTime();
final byte[] bytes = new byte[passwordLength];

// byte-length of the first n passwords, including line breaks:
final int offset = (passwordLength + System.lineSeparator().toCharArray().length) * n;

raf.seek(offset); // skip the first n passwords
raf.read(bytes);

// reset to the beginning of the file, in case we want to read more passwords later:
raf.seek(0); 

System.out.println(new String(bytes));
Turing85
  • 18,217
  • 7
  • 33
  • 58
  • If I want to make a combinasion of not all character but only with this one: azertyuiopqsdfghjklmwxcvbnAZERTYUIOPQSDFGHJKLMWXCVBN1234567890 In my project i need to only make random character like this (your program works great in 3 sec for 5M password I'm impressed) – Forcela8 Apr 01 '21 at 18:58
  • very nice. If you could explain: index < (chunk + 1) * chunkSize && index < numberOfPasswordsToGenerate; – Khanna111 Apr 01 '21 at 18:58
  • @Forcela8 Just incorporate your alphabet :) shouldn't be tooo hard. – Turing85 Apr 01 '21 at 18:59
  • @Khanna111 It's a technical detail in case the the total number of passwords is not evenly divisible by the chunk size. – Turing85 Apr 01 '21 at 19:00
  • @Forcela8 I updated the code. Random letter generation is isolated-out in a separate method. This should make it quite easy for you to drop-in the desired alphabet. – Turing85 Apr 01 '21 at 19:04
  • @Turing85 That's why I ask the question. the min and max represent the ASCII code ? So if i need to implements my alphabet i need to search in the ascii table i guess – Forcela8 Apr 01 '21 at 19:07
  • @Forcela8 no. just define a constant `String` (as you did in your example) and pick a random character from this string. You can almost copy-paste the letter-generation from your code :) – Turing85 Apr 01 '21 at 19:10
  • @Turing85 mmmhhh i really don't understand why YOUR algorithm is so fast compared to mine. I took like only 1 sec for 1 Million of password that so frucking strange :'(. I admit that I only know the basics of Java and it frustrates me to think that I haven't thought of doing this. – Forcela8 Apr 01 '21 at 19:11
  • @Forcela8 As I said: please check the memory on your system. My first attempt ran into memory issues (OS started swapping). Hence the "writing to a file in chunks". – Turing85 Apr 01 '21 at 19:13
  • @Forcela8 I added the changes necessary to use your alphabet at the bottom of my answer. In my tests, this had no impact on performance. – Turing85 Apr 01 '21 at 19:23
  • @Turing85 Is there also a technique to browse such a big file in a very short time? Maybe I ask to much things (oupsi :'( ) – Forcela8 Apr 01 '21 at 19:24
  • [Yes](https://stackoverflow.com/a/29637442/4216641). This solutions needed about 5.5s to read the last password (worst-case scenario). So I would expect it to consume 2.75s on average per password read. I am certain, however, that we can do better since all lines are of equal length. – Turing85 Apr 01 '21 at 19:31
  • @Turing85 Last question, why chunkSize is 1_000_000 what does it mean ? Why we can't put 10 or 100 ? – Forcela8 Apr 01 '21 at 19:41
  • @Forcela8 the `1_000_000` was a guesstimation on my part. I know that my machine can handle storing `1_000_000` passwords in memory and we do not want to hit the I/O too frequently since I/O is expensive (see [Latency numbers every programmer should know](https://gist.github.com/hellerbarde/2843375)). Feel free to experiment with the `chunkSize`. I btw edited my answer and added code ot read-back the `n`-th password from the password file in constant time. This should improve read performance significantly. – Turing85 Apr 01 '21 at 20:03
  • @Forcela8 And please consider [upvoting](https://stackoverflow.com/help/privileges/vote-up) my answer (if you haven't already) and [accepting it as answer](https://stackoverflow.com/help/someone-answers) if it helped you / answered your question. – Turing85 Apr 01 '21 at 20:07
0

I can give you some tips to optimize your code and make it faster, you can use them along with other.

  1. If you know the number of passwords you need, you should create a string array and fill it with the variable in your loop.
  2. If you have to use a dynamic size data structure, use linked list. Linked list is better than array list when adding elements is your main target, and worse if you want to access them more then add them.
  3. Use string builder instead of += operator on strings. The += operator is very 'expensive' in complexity of time, because it always creates new strings. Using string builder append method can speed up your code.
  4. Instead of using Math.random() and multiple the result to your range number, create a static Random object and use yourRandomInstance.next(int range).
  5. Consider use the ascii table to get random character instead of using str.charAt(int index) method, it may speed up your code too, i offer you to check it.
Programmer
  • 803
  • 1
  • 6
  • 13