how to generate list of sequence number in faster way using python

Question

Anyone can improve processing time to generate list of sequence number ? Here is my code and it needs ~ 0.05 second.

import torch
import time
import random
index = [torch.tensor(660000)]
st = time.time()
allowed = [x for x in range(index[0])] + [x for x in range(index[0] + 1,2000)]
index = random.sample(allowed, 1000)
print(time.time()-st)

Please advise

thank you

`[x for x in range(index[0])]` -> `list(range(index[0]))`... but why do you need a `list`? Why won't a `range` object suffice? — juanpa.arrivillaga, Dec 31 '21 at 03:27
Note, `[x for x in whatever]` should always just be `list(whatever)` — juanpa.arrivillaga, Dec 31 '21 at 03:27
Also, `np.arange` is probably faster, if a `numpy.ndarray` would work — juanpa.arrivillaga, Dec 31 '21 at 03:28
It would be easier to answer if we know what you're going to do with the list — éclairevoyant, Dec 31 '21 at 04:13
There is random.sample afterwards that requires a list. Actually, I would like to use code in github here [link] https://github.com/akwasigroch/Pretext-Invariant-Representations/blob/master/utils.py in return_random function but I feel the performance is slow — Aryo Wibowo, Dec 31 '21 at 04:22
*I feel the performance is slow* -- Why? If you're going to be doing tensor processing, that is going to vastly overwhelm the processing here. First, get your app running. THEN decide if parts are too slow. — Tim Roberts, Dec 31 '21 at 04:27
I tested this part take the longest time. In the end of the day the whole process requires ~ 12 hours for one epoch. I think the improvement in this part would bring the faster time. — Aryo Wibowo, Dec 31 '21 at 04:37

score 0 · Answer 1 · answered Dec 31 '21 at 16:44

0

This should help with OP specific case, not with "generate list of sequence number" stated in question title.

You do not need to create whole list, you can provide the range, this will work with same speed regardless of range size.

index = random.sample(range(index[0]), 1000)

On my machine it is ~100x faster for range size 1 million

Based on this answer

answered Dec 31 '21 at 16:44

dankal444

3,172
1
23
35

sorry, I have edited the code. how to implement your suggestion for range something like [x for x in range(index[0])] + [x for x in range(index[0] + 1,2000)] inside random.sample ? so, there is if conditional for the range. – Aryo Wibowo Jan 03 '22 at 03:26
@AryoWibowo you can sample both ranges the same way I did, then you can randomly choose which range to use (using weighted random) for each sample `is_first_range = random.random() < first_range_weight` you could also use Numpy functions for all of this (`np.random.choice`, `np.random.random`) – dankal444 Jan 03 '22 at 14:09

how to generate list of sequence number in faster way using python

1 Answers1