0

Anyone can improve processing time to generate list of sequence number ? Here is my code and it needs ~ 0.05 second.

import torch
import time
import random
index = [torch.tensor(660000)]
st = time.time()
allowed = [x for x in range(index[0])] + [x for x in range(index[0] + 1,2000)]
index = random.sample(allowed, 1000)
print(time.time()-st)

Please advise

thank you

  • 2
    `[x for x in range(index[0])]` -> `list(range(index[0]))`... but why do you need a `list`? Why won't a `range` object suffice? – juanpa.arrivillaga Dec 31 '21 at 03:27
  • 1
    Note, `[x for x in whatever]` should always just be `list(whatever)` – juanpa.arrivillaga Dec 31 '21 at 03:27
  • 1
    Also, `np.arange` is probably faster, if a `numpy.ndarray` would work – juanpa.arrivillaga Dec 31 '21 at 03:28
  • 1
    It would be easier to answer if we know what you're going to do with the list – éclairevoyant Dec 31 '21 at 04:13
  • There is random.sample afterwards that requires a list. Actually, I would like to use code in github here [link] https://github.com/akwasigroch/Pretext-Invariant-Representations/blob/master/utils.py in return_random function but I feel the performance is slow – Aryo Wibowo Dec 31 '21 at 04:22
  • *I feel the performance is slow* -- Why? If you're going to be doing tensor processing, that is going to vastly overwhelm the processing here. First, get your app running. THEN decide if parts are too slow. – Tim Roberts Dec 31 '21 at 04:27
  • I tested this part take the longest time. In the end of the day the whole process requires ~ 12 hours for one epoch. I think the improvement in this part would bring the faster time. – Aryo Wibowo Dec 31 '21 at 04:37

1 Answers1

0

This should help with OP specific case, not with "generate list of sequence number" stated in question title.

You do not need to create whole list, you can provide the range, this will work with same speed regardless of range size.

index = random.sample(range(index[0]), 1000)

On my machine it is ~100x faster for range size 1 million


Based on this answer

dankal444
  • 3,172
  • 1
  • 23
  • 35
  • sorry, I have edited the code. how to implement your suggestion for range something like [x for x in range(index[0])] + [x for x in range(index[0] + 1,2000)] inside random.sample ? so, there is if conditional for the range. – Aryo Wibowo Jan 03 '22 at 03:26
  • @AryoWibowo you can sample both ranges the same way I did, then you can randomly choose which range to use (using weighted random) for each sample `is_first_range = random.random() < first_range_weight` you could also use Numpy functions for all of this (`np.random.choice`, `np.random.random`) – dankal444 Jan 03 '22 at 14:09