2

I want to make an array that looks something like

[0, 0, 0, 1, 1 , 1, 2, 2, 2,  . . .etc]

or

[4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, . . . etc]

There is something like

segments = [i for i in range(32)]  

which will make

 [ 1, 2, 3, 4, 5, . . . etc]

There are ways where I can call 3 separate sets of i in range(32) but I am looking to save computation by only calling it once.

What's the most computationally efficient and programatically elegant way of making array like

[0, 0, 0, 1, 1 , 1, 2, 2, 2,  . . .etc]
SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116
  • Do you actually mean array or do you mean a list? – dfundako May 31 '18 at 20:23
  • 2
    You can do `sorted(range(32)*3)` – pault May 31 '18 at 20:24
  • 1
    @pault https://pythonclock.org/ :) – llllllllll May 31 '18 at 20:29
  • dfundako I realize I used the wrong terminology in my post but either one is fine. pault I tried segment_idx = [ sorted(range(32)*3)] but it gave me an error – SantoshGupta7 May 31 '18 at 20:31
  • @liliscent it's up to my employer, and sadly I don't think they're going to change. I assume you mean it should be `list(sorted(range(32)*3))` for python3 but there are better answers below. – pault May 31 '18 at 20:31
  • 1
    @pault: no: rather `sorted(list(range(32))*3)` – Jean-François Fabre May 31 '18 at 20:33
  • @Jean-FrançoisFabre shows how little I know about 3.x. – pault May 31 '18 at 20:34
  • @pault not really a problem. First step for you: install it and watch your python 2 scripts crash :) you can prepare for those crashes by using the `-3` option of recent python 2 interpreters, which warn you for most classic incompatibilities – Jean-François Fabre May 31 '18 at 20:36
  • 3
    @SantoshGupta7 your question is good, it's just that the title makes no sense / is completely irrelevant to the body of the question. I suggest you create a better title, because this question is worth keeping (would make search easier) – Jean-François Fabre May 31 '18 at 20:40
  • @Jean-FrançoisFabre I see that now, will change it. However, having trouble coming up with a good title. It's a simple problem, but the best I could think of is 'Most computationally efficient way to make a Python list like [0, 0, 0, 1, 1, 1, 2, 2, 2 . . . etc.]' which is not exactly clear either. – SantoshGupta7 May 31 '18 at 20:53
  • updated the title @Jean-FrançoisFabre – SantoshGupta7 Jun 03 '18 at 23:09

5 Answers5

7

Use itertools.chain on itertools.repeat iterables:

import itertools

result = list(itertools.chain.from_iterable(itertools.repeat(i,3) for i in range(32)))

print(result)

result:

[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27, 27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 30, 31, 31, 31]

This technique avoids the creation of intermediate lists and minimizes the pure python loops (one python loop total, using map could be possible to remove that last one, but that would require a lambda in that case, which adds one more function call).

EDIT: let's bench this answer and Ted's answer

import itertools,time

n=1000000

start_time = time.time()
for _ in range(n):
    list(itertools.chain.from_iterable(itertools.repeat(i,3) for i in range(32)))

print("itertools",time.time() - start_time)

start_time = time.time()
for _ in range(n):
    [i for i in range(32) for _ in range(3)]
print("flat listcomp",time.time() - start_time)

results:

itertools 10.719785928726196
flat listcomp 13.869723081588745

so using itertools instead of list comprension is around 30% faster (python 3.4, windows)

Notes:

the small number of repeats generates a lot of itertools.repeat calls in the inner loop, so in that case of 3 repeats, it's faster to do what NickA suggests:

list(itertools.chain.from_iterable((i,)*3 for i in range(32)))

(7 seconds vs 10 in the above bench)

And numpy solution is even faster (around 1.5 second), if you can use numpy:

import numpy as np
np.arange(32).repeat(3)  # credits: liliscent 
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • This answer should be more performant than a plain list comprehension for large data. Though for larger data we will use NumPy... – llllllllll May 31 '18 at 20:37
  • numpy is cheating, but yes :) – Jean-François Fabre May 31 '18 at 20:38
  • @liliscent around what data size should we start using numpy? Jean-François Fabre , there is a simpler solution below by Ted Klein Bergman though this one is more voted. Is there some hidden efficiency reason that your solution has which the other doesn't? – SantoshGupta7 May 31 '18 at 20:49
  • the other solution is simpler, and okay. It's just that in python it's better to minimize loops (and also function calls). We should bench both solutions to see which one is best – Jean-François Fabre May 31 '18 at 20:50
  • @SantoshGupta7 When you want something like the one in your question, it's often better to use NumPy to facilitate further operation. In NumPy, use `np.arange(32).repeat(3)`. – llllllllll May 31 '18 at 20:55
  • @liliscent you should post an answer with numpy, since it's around 7 times _faster_ than my answer :) converting array to list using `.tolist()` slows down but it's still 5 times faster. – Jean-François Fabre May 31 '18 at 20:57
  • 1
    Note: Given the small number of repeats needed, I suspect the setup overhead of `repeat` will hurt you. Changing to `list(itertools.chain.from_iterable((i,)*3 for i in range(32)))` would move from general object construction to interpreter supported syntax that uses less general code paths; bad for memory if the multiplier is high, but likely cheaper for small multipliers as in this case. – ShadowRanger May 31 '18 at 21:05
  • well, looks like you're right. It's faster with only 3 elements. Probably slower if there are more repeats to be done. – Jean-François Fabre May 31 '18 at 21:09
  • I benched the two solutions and this one is faster, .002 vs .003. I also tried the numpy one and it was about the same or maybe a bit slower. But that was for np.arange(32).repeat(3) and I saw tht numpy's advantage is when handling bigger data, so I did 5 and 4000 instead of 3 and 32 and numpy was the fastest. – SantoshGupta7 May 31 '18 at 21:20
  • now you have a bunch of valid solutions, make your pick depending on your values. that was a nice challenge. – Jean-François Fabre May 31 '18 at 21:23
4

Just use nested loops in the list comprehension.

segments = [i for i in range(32) for _ in range(3)]

Output:

[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27, 27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 30, 31, 31, 31]
Ted Klein Bergman
  • 9,146
  • 4
  • 29
  • 50
2

Use floor division:

def repeated_value_list(repeats, start, stop=None):
    if stop is None:
        start, stop = 0, start
    return [x//repeats for x in range(start*repeats, stop*repeats)]

Example output:

>>> repeated_value_list(3, 5)
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]

>>> repeated_value_list(3, 4, 10)
[4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9]

This is even more efficient if you actually want a numpy array, since broadcasting lets the floor division happen without a comprehension:

import numpy as np

def repeated_value_array(repeats, start, stop=None):
    if stop is None:
        start, stop = 0, start
    return np.arange(start*repeats, stop*repeats) // repeats

Output:

>>> repeated_value_array(3, 5)
array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], dtype=int32)
Blckknght
  • 100,903
  • 11
  • 120
  • 169
1

If we had [(0, 0, 0), (1, 1, 1), …], we'd just have to flatten that:

[elem for sublst in lst for elem in sublst]

How do we get that? Well, if we had three separate sequences [0, 1, 2, …], we could just zip them together:

lst = zip(r1, r2, r3)

And those three sequences are just range(32):

lst = zip(range(32), range(32), range(32))

… or, if you want it to be dynamic rather than exactly 32 and 3:

lst = zip(*(range(count) for _ in range(reps)))

Either way, you can put it together into a one-liner:

[elem for sublst in zip(*(range(count) for _ in range(reps))) for elem in sublst]

And then you can simplify that:

[elem for elem in range(count) for _ in range(reps)]
abarnert
  • 354,177
  • 51
  • 601
  • 671
0

You can do this using itertools.chain.from_iterable:

>>> list(itertools.chain.from_iterable([[i]*3 for i in range(32)]))
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27, 27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 30, 31, 31, 31]
Nick is tired
  • 6,860
  • 20
  • 39
  • 51
  • Definitely do not want to do this. Read [this answer](https://stackoverflow.com/a/41772165/5858851). (not the downvoter) – pault May 31 '18 at 20:26
  • @Jean-FrançoisFabre why's that? – Nick is tired May 31 '18 at 20:35
  • because 1) it gives values above 31, and it doesn't yield identical values 3 times. OP title is misleading, I admit – Jean-François Fabre May 31 '18 at 20:37
  • @Jean-FrançoisFabre I'm aware of that, look at the question title, and the first line mentions looking to make an array *like* `[0,0,0,1,1,1,...]`. I'm inclined to believe that OP is looking for functionality similar to that shown in the second example I've given, as indicated by the `[i, i+1, i+3 for i in range(32)]` they provided – Nick is tired May 31 '18 at 20:38
  • @Jean-FrançoisFabre Ah, I see you agree on the main post that the title is a bit off-putting, if it's changed to make more sense i'll remove this and +1 yours as they're essentially the same answer although yours better written – Nick is tired May 31 '18 at 20:42
  • don't remove it. Just edit the last part out. My secret technique is "never read question titles" :) – Jean-François Fabre May 31 '18 at 20:43
  • @Jean-FrançoisFabre Yeah, I used to do that, then got told off when I said questions that don't have a question in the body are "unsure what you're asking" when the actual question is in the title – Nick is tired May 31 '18 at 20:44
  • @NickA turns out that with the current repeat value (3), your answer is 30% faster than mine :) – Jean-François Fabre May 31 '18 at 21:13
  • @Jean-FrançoisFabre I gather yours vastly improves compared to mine when that number gets higher however :) – Nick is tired Jun 01 '18 at 08:08
  • so there's no absolute truth :) – Jean-François Fabre Jun 01 '18 at 08:20