Pythonic way to separate a list into several sublists where each sublist can only take values within a given range

Question

I have a list of integers which span several useful ranges. I found in:

A way to do that and my code works, but I wondered if there was a better way to do it while making it readable?

EDIT: I would encourage anyone with a similar need to explore all of the answers given, they all have different methods and merits. Thank you to all who helped on this.

Working inelegant code

my_list = [1, 2, 11, 29, 37]
r1_lim = 10
r2_lim = 20
r3_lim = 30
r4_lim = 40


r1_goodvals = list(range(1, r1_lim+1))
print("r1_goodvals : ", r1_goodvals)

r2_goodvals = list(range(r1_lim+1, r2_lim+1))
print("r1_goodvals : ", r2_goodvals)

r3_goodvals = list(range(r2_lim+1, r3_lim+1))
print("r3_goodvals : ", r3_goodvals)

r4_goodvals = list(range(r3_lim+1, r4_lim+1))
print("r4_goodvals : ", r4_goodvals)


r1, r2, r3, r4 = [], [], [], []
for x in my_list:
    if x in r1_goodvals:
        r1.append(x)
    elif x in r2_goodvals:
        r2.append(x)
    elif x in r3_goodvals:
        r3.append(x)
    elif x in r4_goodvals:
        r4.append(x)

print("r1 : ", r1)
print("r2 : ", r2)
print("r3 : ", r3)
print("r4 : ", r4)

Output

r1_goodvals :  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
r1_goodvals :  [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
r3_goodvals :  [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
r4_goodvals :  [31, 32, 33, 34, 35, 36, 37, 38, 39, 40]
r1 :  [1, 2]
r2 :  [11]
r3 :  [29]
r4 :  [37]

Your question is better suited for [Code Review Stack Exchange](https://codereview.stackexchange.com/). — martineau, Aug 27 '21 at 08:56

blhsing · Answer 1 · 2021-08-27T10:10:31.367

You can achieve linear time complexity with minimal code by initializing the limits as a list in reverse order and making comparisons between the lowest limit at the end of the limit list to the current item in the iteration of the input list. If that limit is lower, append a new sub-list to the result list and pop the limit list for the next lowest limit. Then append the current item to the last sub-list, which will always be the one within the current limit.

my_list = [1, 2, 11, 29, 37]
limits = [40, 30, 20, 10]
r = [[]]
for i in my_list:
    if limits[-1] < i:
        r.append([])
        limits.pop()
    r[-1].append(i)

r becomes:

[[1, 2], [11], [29], [37]]

Like @Drecker mentions in the comment, however, that this solution comes with the assumption that my_list is pre-sorted. If the assumption isn't valid then it would require a cost of O(n log n) sort the list first.

Also as @Drecker mentions in the comment, some may find the use of an iterator to be more Pythonic than a list with pop, in which case the limits can be listed in the more intuitive ascending order, but there would need to be an additional variable to keep track of the current lowest limit since calling next on an iterator consumes the item:

my_list = [1, 2, 11, 29, 37]
limits = iter((10, 20, 30, 40))
limit = next(limits)
r = [[]]
for i in my_list:
    if limit < i:
        r.append([])
        limit = next(limits)
    r[-1].append(i)

I agree this is possibly the cleanest solution. However you should mentioned that you assume the `my_list` to be pre-sorted (if it is not you can sort it which would lead to lin*log complexity). Additionally -- since the question is about *pythonic* way of doing things -- I would think that using iterators instead of `pop`ing from the queue would be more suitable — Drecker, Aug 27 '21 at 10:01
@Drecker Agreed on both points. Updated my answer accordingly then. Thanks. — blhsing, Aug 27 '21 at 10:10

score 1 · Accepted Answer · answered Aug 27 '21 at 08:54

If I needed this to be more readable, and keep the same functionality, I would probably do in this kind of way. Alltough if you would start adding more r_n limits this approach would quickly become cluttered.

my_list = [1, 2, 11, 29, 37]

r1, r2, r3, r4 = [], [], [], []
for x in my_list:
    if x in range(1, 11):
        r1.append(x)
    elif x in range(11, 21):
        r2.append(x)
    elif x in range(21, 31):
        r3.append(x)
    elif x in range(31, 41):
        r4.append(x)

print("r1 : ", r1)
print("r2 : ", r2)
print("r3 : ", r3)
print("r4 : ", r4)

Using list comprehension in this case would make the runtime O(n * number_of_r_n) since you would then need to loop over the entire my_list array for each 'range'. While this has runtime O(n) with n being the length of the array.

This still has a lot of code duplications and won't scale well if there are a large number of limits. — blhsing, Aug 27 '21 at 09:49
I already mentioned that it wouldn't scale well. It does have relatively much duplication, but as OP asked, it is *more readable* than a more 'algorithmic' approach. You can quickly understand what this code does. — tjallo, Aug 27 '21 at 13:30

Drecker · Answer 3 · 2021-08-27T11:03:28.970

1

Another solution would involve using binary search:

from bisect import bisect
my_list = [1, 2, 11, 29, 37, 100]
limits = [10, 20, 30, 40]

groups = [[] for _ in range(len(limits) + 1)]
for x in my_list:
  groups[bisect(limits, x)].append(x)

print(groups)

[[1, 2], [11], [29], [37], [100]]

This is quite fast solution even for high number of limits O(number_of_elements * log(number_of_limits)) and in certain sense it is as fast as you can get, for arbitrary limits.

However, if you have additional information -- for example you want to group the numbers based on their rounding to tens and the list is pre-sorted, you could use itertools.groupby:

from itertools import groupby
my_list = [1, 2, 11, 29, 37, 100]
groups = {key: list(lst) for key, lst in groupby(my_list, lambda x: round(x, -1))}
# We could use 'x // 10' instead of 'round(x, -1)' to get 'flooring'
# hence, results would be more similar to your original usage
print(groups)

{0: [1, 2], 10: [11], 30: [29], 40: [37], 100: [100]}

You can drop the requirement on the pre-sorting, by replacing the comprehension by a full for-loop and using collections.defaultdict(list). This solution is just O(number_of_elements) in terms of time complexity

edited Aug 27 '21 at 11:03

answered Aug 27 '21 at 09:33

Drecker

1,215
10
24

1

The first solution isn't efficient since it requires a non-linear time complexity like you mentioned. The second solution requires the limits to be multiples of 10, which is a big assumption to make and is nowhere guaranteed by the OP's requirements. – blhsing Aug 27 '21 at 09:52
1

For my present use case the multiples are luckily all multiples of 10 (500 for the real use case) but a very good point, thank you. – Windy71 Aug 27 '21 at 10:05
By the way, the second solution also comes with the assumption that the list is pre-sorted, with the use of `groupby`. If it needs to be sorted first then the time complexity also becomes *O(n log n)*. – blhsing Aug 27 '21 at 10:18
@blhsing thanks, I've added the requirement to the answer – Drecker Aug 27 '21 at 11:04

Pythonic way to separate a list into several sublists where each sublist can only take values within a given range

3 Answers3