3

The code below has taken 5 hours to get 1m done lol this is way too slow.

int numbers = 10000000
int count = 0
def data = []
File file = new File("out.txt")

while (count < numbers) {
    def number = makeNumber(numbers)
    if (!data.contains(number)) {
        print count + ':Adding ' + number + '\n'
        data << number
        count++
    }
}

for (i in data) {
    print 'Appending ' + i + '\n'
    file.append(i + '\n')
}

String makeNumber(numbers) {
    def range = "83"
    return range + (new Random().nextInt(20000000) + numbers)
}

I am currently trying out java.security.SecureRandom as in this request to generate the numbers but its an unknown API to me so it is going slow. A lot of other options aren't guaranteeing uniqueness or speed. I have the uniqueness down I'm pretty sure of that but the speed (please can you throw me some ideas to speed this up? I was thinking sending to other threads but I'd hit the same problem won't I?).

0009laH
  • 1,960
  • 13
  • 27
imp
  • 435
  • 6
  • 20
  • 2
    Write to the file in larger chunks. – Dave Newton Aug 23 '23 at 14:22
  • Tangential: if these are numbers why make them strings? Numeric comparisons will be faster than string comparisons. If adding an "83" prefix is a thing, but not directly related to anything else, that can be done on number retrieval *(as opposed to during generation/de-duping/etc)*. – Dave Newton Aug 23 '23 at 14:30
  • 2
    Other simple improvements: don't create new `Random` instances all the time - you only need one. Don't print to stdout – g00se Aug 23 '23 at 14:35
  • 1
    Also tangential: naming a number `numbers` is confusing, and it's being used for two different things here. I'd consider naming it `numToGenerate` or `maxNumbers` or sthng; you can still re-use it as a constant to add to each generated number, but IMO it's funky. – Dave Newton Aug 23 '23 at 14:39
  • Another possibility if you have the memory : create a massive list of consecutive numbers, shuffle it and delete from it until you have the quantity of numbers required – g00se Aug 23 '23 at 14:46
  • ooo - I didnt know about Set :) I wont have to iterate over the object, but I still need a minimum 10m entries so i need to change the loop to make sure its filled Let me remove that new instance, I'm sure that'll save me some memory too thanks and i will try adding the '83' (yes its required) when adding to the file too. @g00se, you just saved me alot of time I can even do that directly in bash lol – imp Aug 23 '23 at 14:52
  • wow am I impressed with sets its taken less than 10minutes on first go!!! thank you - I will post a fully working answer tonight :) – imp Aug 23 '23 at 15:25
  • Explain the "83". – Basil Bourque Aug 23 '23 at 16:10
  • @BasilBourque all numbers must start with 83, 82 or 81 - its a business requirement – imp Aug 23 '23 at 16:24
  • `ThreadLocalRandom.current().ints(0, 20_000_000).distinct().limit(10_000_000)` – Johannes Kuhn Aug 23 '23 at 20:39

5 Answers5

3

the following groovy code takes ~ 2.5 seconds to generate 10 million unique integers


@groovy.transform.CompileStatic
Set<Integer> generateUniqueNumbers(int count, int bound=20000000, SplittableRandom rnd = new SplittableRandom()) {
    Set<Integer> data = new HashSet<Integer>(count)
    while(data.size() < count){
        data.add( rnd.nextInt(bound) )
    }
    return data
}

@groovy.transform.CompileStatic
def writeToFile(Set<Integer> data, String path){
    new File(path).withWriter("UTF-8"){w->
        data.each{i-> w.append('83').append(i.toString()).append('\n')}
    }
}


def t = System.currentTimeMillis()
def data  = generateUniqueNumbers(10000000)
println data.size()
println "generate time = ${System.currentTimeMillis() - t} millis"

//writing to file
t = System.currentTimeMillis()
writeToFile(data, "/11/out.txt")
println "write time = ${System.currentTimeMillis() - t} millis"

result:

10000000
generate time = 2636 millis
write time = 2182 millis

if the order of generated numbers is important - you can use LinkedHashSet instead of HashSet, but this will increase execution time by ~ x2 - x3

note @groovy.transform.CompileStatic annotation - always try to use it in groovy when performance is important and you have a large/long loop...

UPD: used SplittableRandom instead of Random because it's faster

UPD2: added write to file approach using writer. the implementation in original question File.append(...) opens and closes file every time.

daggett
  • 26,404
  • 3
  • 40
  • 56
2

tl;dr

In Java syntax:

List < String > range =
        IntStream
                .range ( 10_000_000 , 20_000_000 )   // Generate a sequence of all desired values, rather than generate random numbers.
                .mapToObj ( ( int i ) -> "83" + i )  // Convert from `int` primitive to `String` object, and prepend "83". 
                .collect ( Collectors.toCollection ( ArrayList :: new ) );
Collections.shuffle ( range , random );
Files.write ( Paths.get ( "/tmp/numbers.txt" ) , range );

Executes in 2-3 seconds on a laptop computer.

Or we can try this nifty nearly-one-liner based on Comment by Johannes Kuhn. But this takes longer, upwards of twice as long at 3.5 to 4.5 seconds.

try
{
    Files.write (
            Paths.get ( "/tmp/numbers.txt" ) ,
            SecureRandom
                    .getInstance( "NativePRNG" )
                    .ints ( 0 , 20_000_000 )
                    .distinct ( )
                    .limit ( 10_000_000 )
                    .mapToObj ( ( int i ) -> "83" + i )
                    .toList ( )
    );
}
catch ( IOException e ) { throw new RuntimeException ( e ); }

Generate a sequence of numbers, not random numbers

Apparently you want ten million numbers between ten million and twenty million. Notice that means simply the range from 10,000,000 and 20,000,000. Your range of possible values equates to your range of desired results. So we can start by assembling a sequence of ten-twenty million.

We use IntStream.range to assemble the sequence of numbers as a stream of int values.

You said you want to treat each number as text, and prepend the text "83". So we use Stream::map to convert each number to text along with the prepend.

I do not yet know Groovy, so I'll use Java syntax.

List < String > range =
        IntStream
                .range ( 10_000_000 , 20_000_000 )
                .mapToObj ( ( int i ) -> "83" + i ) 
                .collect ( Collectors.toCollection ( ArrayList :: new ) );

Report first and last values to verify.

System.out.println ( range.get ( 0 ) + " - " + range.get ( range.size ( ) - 1 ) );

And you want this sequence of ten million strings in random order. So call Collections.shuffle to re-order them randomly. No need to generate random numbers.

Collections.shuffle ( range );

Report first and last values to verify.

System.out.println ( range.get ( 0 ) + " - " + range.get ( range.size ( ) - 1 ) );

When run:

8310000000 - 8319999999
8315634260 - 8311372639

Strong random number generator

If you need a more cryptographically strong random number generator than provided by default to perform the shuffle, you can pass an alternate source of randomness. This can be Random or any subclass, including SecureRandom. For example:

Collections.shuffle ( 
    range , 
    SecureRandom.getInstance( "DRBG" ) 
);

I am no expert on random number generators. But I am guessing these may be of interest to you:

I gather from those documents that via the Java Service provider interface and SecureRandomSpi class, you can plug in third-party generator implementations in addition to the new-and-improved ones bundled with some JDKs.

Write file

And you said you need to write these to a file. The List of strings is itself an Iterable. So we can merely pass to Files.write, a method in the Java NIO package of modern file-handling classes.

try
{
    Files.write (
            Paths.get ( "/tmp/numbers.txt" ) ,
            range
    );
}
catch ( IOException e ) { throw new RuntimeException ( e ); }
}

The resulting file is 110 MB in size.

Performance

You will need a bit of free memory to run the code above. We create all the data in memory before writing to storage. I would guess it takes about 200 to 300 MB of RAM, given the file size reported above.

You have a concern about speed of execution. So let's run it once, then run several times to get an average of elapsed time. For more serious benchmarking, use JMH.

app.randomWrite ( );

int repetitions = 10;
List < Duration > durations = new ArrayList <> ( repetitions );
for ( int index = 0 ; index < repetitions ; index++ ) durations.add ( app.randomWrite ( ) );
System.out.println ( "durations = " + durations );

When run on an Apple MacBook Pro with Apple M1 Pro chip, 16 GB memory, and internal solid-state storage, the execution times are around 2.5 seconds, in a range of 2-3 seconds.

durations = [PT2.501965458S, PT2.315898625S, PT2.398674875S, PT2.24017325S, PT2.047942S]
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
1

If data was a set. You search at each randomization in the array, that has a complexity of O(N) and searching in a set has a complexity of O(1).

Doing N + c randomizations (c = the number of duplicates at randomization) is by itself an O(N) complexity, so if we consider only this aspect and ignore everything else from the time being, then you have an O(N^2) algorithm that can be reduced to an algorithm of O(N) simply by switching to sets.

You should also get rid of the console prints if possible. Those are costly operations. The same goes for file writing.

You should cache the latest few numbers and store them into your output file. Maybe 100 numbers. And then you can also write into the console that those numbers were added.

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
0

The answer from @daggert was accepted but it doesnt meet my requirements. When tried it gave numbers like 1 where I needed a set size. It's accepted because I wasnt clear on this, and honestly works like a charm. I used the file writing parts mixed with the sets code I had written and this does what I need in under 20 seconds now.

int amountToGenerate = 10_000_000
int range = amountToGenerate*2
Set data = []
Random randoms = new Random()

while (data.size() < amountToGenerate){
    int number = randoms.nextInt(range)+amountToGenerate
    data.leftShift(number)
}

@groovy.transform.CompileStatic
def writeToFile(Set<Integer> data, String path){
    new File(path).withWriter("UTF-8"){w->
        data.each{i-> w.append('83').append(i.toString()).append('\n')}
    }
}
writeToFile(data, /c:\temp\out.txt/)
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
imp
  • 435
  • 6
  • 20
0

Just around 14 seconds on my Core i5 with 8Gb RAM to create the 40Mb file of numbers. If you need them as text, it might be better to recreate them as text as you need them.

import java.security.SecureRandom;
import java.nio.file.Path;
import java.nio.file.Files;
import java.nio.file.StandardOpenOption;
import java.nio.channels.FileChannel;
import java.nio.MappedByteBuffer;
import java.util.Set;
import java.util.HashSet;
import java.util.EnumSet;

public class Rands {
    public static void main(String[] args) throws Exception {
        final int NUMBERS = 10_000_000;
        final int MAX = 20_000_000;
        final int MIN = 83;
        Set<Integer> numbers = new HashSet<>(MAX - MIN);

        SecureRandom r = new SecureRandom();
        while (numbers.size() < NUMBERS) {
            numbers.add(r.nextInt(MIN, MAX));
        }

        Path pathToWrite = Path.of("numbers.bin");

        try (FileChannel fileChannel = (FileChannel) Files.newByteChannel(pathToWrite,
                EnumSet.of(StandardOpenOption.CREATE, StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.TRUNCATE_EXISTING))) {

            MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, numbers.size() * 4);

            if (mappedByteBuffer != null) {
                numbers.stream()
                    .forEach(n ->mappedByteBuffer.putInt(n));
            }
        }
    }
}
g00se
  • 3,207
  • 2
  • 5
  • 9