map a list of integers into corresponding range in python

Question

I want to merge a list in to range, but keeping the original order. Meanwhile with custom gap support.

for example, when input list [0, 1, 3, 7, 4, 2, 8, 9, 11, 11], it is expected to retrun a list of range, ["0-4", "0-4", "7-9", "0-4", "0-4", "0-4", "7-9", "7-9", "11-11", "11-11"].

def fun(a_list, gap_length=0):
    return a_list_of_range

# from
# [0, 1, 3, 7, 4, 2, 8, 9, 11, 11]
# to
# ["0-4", "0-4", "7-9", "0-4", "0-4", "0-4", "7-9", "7-9", "11-11", "11-11"]
# or to
# {0:"0-4", 1:"0-4", 2:"0-4", 3:"0-4", 4:"0-4", 7:"7-9", 8:"7-9", 9:"7-9", 10:"11-11"}

There is a similar question on stackoverflow, but all the answers can't return range in the corresponding order.

What is your solution?

I wrote a ugly function to solve the problem, but the speed is terrible. The function below support custom gap length for merging list into range.

def to_ranges_with_gap(input_list, gap_len=20):
    """list into range with gap"""
    loc2range = {}
    input_list = sorted(set(input_list))
    start_loc = input_list[0]
    stop_loc = input_list[0]
    range_loc_list = []
    for element in input_list:
        if element < stop_loc + gap_len:
            range_loc_list.append(element)
            stop_loc = element
        else:

            for loc in range_loc_list:
                loc2range[loc] = "{}-{}".format(start_loc, stop_loc)

            start_loc = element
            stop_loc = element
            range_loc_list = [element]

        for loc in range_loc_list:
            loc2range[loc] = "{}-{}".format(start_loc, stop_loc)

    return loc2range

Can you show me a better way to do it?

What dose the list looks like?

The list is:

duplicate
unsorted
not continuous
huge amount of elements. billions of digits span from 0 to 10^10, thus speed matters.

What's the purpose of repeating the ranges in your result list? You could probably write a more elegant solution without requirement for that quirk. – timgeb

For example if I want to deal with the dataframe below, and try to group age range to calculate the median height.

Age  Gender  Height 
2    M       30
4    M       60
2    M       33
3    F       50
20   M       180
22   F       166
40   F       150
33   M       172
...

I hope to get such result. And the age column the the list mentioned above.

2-5  M    40.5
2-6  F    50.9
10-25 M   150.8
...

Thus, it will be better if I can merge the dataframe directly, without generating an mapper and remap it to the dataframe again.

What's the purpose of repeating the ranges in your result list? You could probably write a more elegant solution without requirement for that quirk. — timgeb, Sep 08 '17 at 09:11
Also why not store the ranges as instances of `xrange` instead of strings? — timgeb, Sep 08 '17 at 09:12
@timgeb I want to group a dataframe by value in a certain column, thus I try to generate a dictionary of range. — Chang Ye, Sep 08 '17 at 09:19
I'm not sure about your results, but it seems like you want a histogram. — pylang, Sep 08 '17 at 09:33
@pylang It looks like a histogram, but the data is not continuous and I hope the code can detect the range automatically. — Chang Ye, Sep 08 '17 at 09:50

Nirmi · Answer 1 · 2017-09-08T10:05:30.527

1

I have modified accepted answers code from similar question which you have provided in question and It worked for me

import itertools

def ranges(i):
    for a, b in itertools.groupby(enumerate(i), lambda i: i[1] - i[0]):
        b = list(b)
        if(b[0][1] - b[-1][1] == 0):
                yield "%d-%d"%(b[0][1], b[-1][1])
        for ele in range(b[0][1], b[-1][1]):
                yield "%d-%d"%(b[0][1], b[-1][1])

print ([ele for ele in ranges([0, 1, 2, 3, 4, 7, 8, 9, 11])])

['0-4', '0-4', '0-4', '0-4', '7-9', '7-9', '11-11']

Note: Please let me know If this is wrong way to answer, will take care of it from next time. Mine Intention was just to give appropriate answer and help others, Not to take others answer blantly etc..

Please comment below, If it is so will remove my answer.

I know ,its bad patch.

edited Sep 08 '17 at 10:05

answered Sep 08 '17 at 09:30

Nirmi

356
3
11

This is not the solution. I want to keep the corresponding order. BTW, the lambda in the code should be replaced with `lambda i: i[1]- x[0]` – Chang Ye Sep 08 '17 at 09:47
Thank you @Nirmi. This code is elegant. But I want to get a mapper for the list without altering the order. – Chang Ye Sep 08 '17 at 10:04
@ChangYe. Your question get modified? – Nirmi Sep 08 '17 at 10:07
@Nirmi It works! But Is there any solution when the list is unsorted? – Chang Ye Sep 08 '17 at 10:07
@Nirmi Yes. I doesn't describe the question clearly in the original post, and I have modified the post. Sorry about that.. – Chang Ye Sep 08 '17 at 10:20
@ChangYe Using this method Its not possible to get answer, will try for same and update you – Nirmi Sep 08 '17 at 10:24

score 0 · Answer 2 · answered Sep 08 '17 at 09:56

0

this returns the results you seem to be looking for. It's not much prettier than what you have, but it works:

#!/usr/bin/python

arr = []
l = [1,2,3,5,6,7,8,9,11,12,13,14,20]
start,counter,i = (0,0,0)

while i < len(l):
    start = i
    counter = 0
    while (i < len(l) - 1) and (l[i+1] == l[i] +1):
        counter += 1
        i += 1
    for x in range(counter+1):
        arr.append("{}-{}".format(l[start], l[start+counter]))
    i += 1

print(arr)

output:

['1-3', '1-3', '1-3', '5-9', '5-9', '5-9', '5-9', '5-9', '11-14', '11-14', '11-14', '11-14', '20-20']

answered Sep 08 '17 at 09:56

David Jenkins

461
3
8

What if `l` is unsorted? – Chang Ye Sep 08 '17 at 10:00
then sort it before running the operation – David Jenkins Sep 08 '17 at 10:02

pylang · Answer 3 · 2017-09-08T11:08:15.343

Code

import itertools as it
import collections as ct


# Given
a = [0, 1, 2, 3, 4, 7, 8, 9, 11]
b = [0, 1, 3, 7, 4, 2, 8, 9, 11]                       # unsorted
c = [0, 15, 2, 3, 4, 7, 8, 9, 11, 14]                  # unsorted
d = [0, 15, 2, 3, 4, 7, 8, 9, 11, 14, 2, 4]            # duplicates 


def find_ranges(iterable):
    """Return a defaultdict of ranges."""
    # Find ranges
    sorted_it = sorted(set(iterable))
    keyfunc = lambda i:  sorted_it[i[0]] - i[0]
    ranges = [[item[1] for item in g] 
            for k, g in it.groupby(enumerate(sorted_it), keyfunc)]
    # Out: [[0, 1, 2, 3, 4], [7, 8, 9], [11]]

    # Build dictionary
    dd = ct.defaultdict(int)
    for r in ranges:
        s = "{}-{}".format(min(r), max(r))
        for i in r:
            if i in r:
                dd[i] = s
    return dd

find_ranges(a)

Output

defaultdict(int,
            {0: '0-4',
            1: '0-4',
            2: '0-4',
            3: '0-4',
            4: '0-4',
            7: '7-9',
            8: '7-9',
            9: '7-9',
            11: '11-11'})

Once you have this lookup table, creating a list of ranges is simple:

[find_ranges(b)[i] for i in b]
# ['0-4', '0-4', '0-4', '7-9', '0-4', '0-4', '7-9', '7-9', '11-11']

Details

Furthermore, this function finds ranges for unsorted iterables (b and c), and handles duplicates (d).

assert find_ranges(a) == find_ranges(b)
assert find_ranges(c) == find_ranges(d)

Here we will confirm the results are equivalent for sorted and unsorted inputs. Next we will confirm equivalence for an unsorted input and an input with repeated elements. Finally, we while demonstrate an sample output:

find_ranges(d)

Output

defaultdict(int,
        {0: '0-0',
         2: '2-4',
         3: '2-4',
         4: '2-4',
         7: '7-9',
         8: '7-9',
         9: '7-9',
         11: '11-11',
         14: '14-15',
         15: '14-15'})

Note: the "Find ranges" section is inspired from @Nirmi's post, a great contribution.

Thank you very much. How can I modify the code to deal with input as `a = [0, 15, 2, 3, 4, 7, 8, 9, 11, 14]`. Sorry for not describing the question clearly. I have update the post. — Chang Ye, Sep 08 '17 at 10:14
INPUT: `a = [0, 15, 2, 3, 4, 7, 8, 9, 11, 14] `; OUTPUT: ` {0: '0-0', 15: '14-15', 1: '1-4', 2: '1-4', 3: '1-4', 4: '1-4', 7: '7-9', 8: '7-9', 9: '7-9', 11: '11-11', 14:'14-15'})` — Chang Ye, Sep 08 '17 at 10:16
`b = sorted(set(a)) ` may be better, for there is duplicate element in the list. Such as `a = [0, 15, 2, 3, 4, 7, 8, 9, 11, 2 , 4]` — Chang Ye, Sep 08 '17 at 10:40
Correct. This answer has been updated. I suggest updating your new requirements in your question. — pylang, Sep 08 '17 at 10:54

Chang Ye · Answer 4 · 2017-09-09T10:54:13.870

Inspired by this post and @pylang, I found a O(n) solution.

import itertools
import collections

# an example list
l = [100, 107, 0, 1, 2, 3, 5, 6, 10, 11, 65, 66, 68, 68]

class Gap(object):
    def __init__(self, diff):
        self.diff, self.flag, self.prev = diff, [0,1], None
    def __call__(self, elem):
        if self.prev and abs(self.prev - elem) > self.diff:
            self.flag = self.flag[::-1]
        self.prev= elem
        return self.flag[0]


def list_to_ranges_with_gap(raw_list, gap_length = 1):
    """Return a defaultdict of ranges."""

    # Find ranges
    sorted_list = sorted(set(raw_list))
    merged_ranges = [list(g) for k, g in 
                     itertools.groupby(sorted_list, key = Gap(gap_length))]

    # Build dictionary
    list2range = collections.defaultdict(int)
    for r in merged_ranges:
        for i in r:
            list2range[i] = "{}-{}".format(r[0], r[-1])

    return list2range

list_to_ranges_with_gap(l, 10)

defaultdict(int,
            {0: '0-11',
             1: '0-11',
             2: '0-11',
             3: '0-11',
             5: '0-11',
             6: '0-11',
             10: '0-11',
             11: '0-11',
             65: '65-68',
             66: '65-68',
             68: '65-68',
             100: '100-107',
             107: '100-107'})

map a list of integers into corresponding range in python

4 Answers4