0

I want to merge a list in to range, but keeping the original order. Meanwhile with custom gap support.

for example, when input list [0, 1, 3, 7, 4, 2, 8, 9, 11, 11], it is expected to retrun a list of range, ["0-4", "0-4", "7-9", "0-4", "0-4", "0-4", "7-9", "7-9", "11-11", "11-11"].

def fun(a_list, gap_length=0):
    return a_list_of_range

# from
# [0, 1, 3, 7, 4, 2, 8, 9, 11, 11]
# to
# ["0-4", "0-4", "7-9", "0-4", "0-4", "0-4", "7-9", "7-9", "11-11", "11-11"]
# or to
# {0:"0-4", 1:"0-4", 2:"0-4", 3:"0-4", 4:"0-4", 7:"7-9", 8:"7-9", 9:"7-9", 10:"11-11"}

There is a similar question on stackoverflow, but all the answers can't return range in the corresponding order.


What is your solution?

I wrote a ugly function to solve the problem, but the speed is terrible. The function below support custom gap length for merging list into range.

def to_ranges_with_gap(input_list, gap_len=20):
    """list into range with gap"""
    loc2range = {}
    input_list = sorted(set(input_list))
    start_loc = input_list[0]
    stop_loc = input_list[0]
    range_loc_list = []
    for element in input_list:
        if element < stop_loc + gap_len:
            range_loc_list.append(element)
            stop_loc = element
        else:

            for loc in range_loc_list:
                loc2range[loc] = "{}-{}".format(start_loc, stop_loc)

            start_loc = element
            stop_loc = element
            range_loc_list = [element]

        for loc in range_loc_list:
            loc2range[loc] = "{}-{}".format(start_loc, stop_loc)

    return loc2range

Can you show me a better way to do it?


What dose the list looks like?

The list is:

  • duplicate
  • unsorted
  • not continuous
  • huge amount of elements. billions of digits span from 0 to 10^10, thus speed matters.

What's the purpose of repeating the ranges in your result list? You could probably write a more elegant solution without requirement for that quirk. – timgeb

For example if I want to deal with the dataframe below, and try to group age range to calculate the median height.

Age  Gender  Height 
2    M       30
4    M       60
2    M       33
3    F       50
20   M       180
22   F       166
40   F       150
33   M       172
...

I hope to get such result. And the age column the the list mentioned above.

2-5  M    40.5
2-6  F    50.9
10-25 M   150.8
...

Thus, it will be better if I can merge the dataframe directly, without generating an mapper and remap it to the dataframe again.

Chang Ye
  • 1,105
  • 1
  • 12
  • 25

4 Answers4

1

I have modified accepted answers code from similar question which you have provided in question and It worked for me

import itertools

def ranges(i):
    for a, b in itertools.groupby(enumerate(i), lambda i: i[1] - i[0]):
        b = list(b)
        if(b[0][1] - b[-1][1] == 0):
                yield "%d-%d"%(b[0][1], b[-1][1])
        for ele in range(b[0][1], b[-1][1]):
                yield "%d-%d"%(b[0][1], b[-1][1])

print ([ele for ele in ranges([0, 1, 2, 3, 4, 7, 8, 9, 11])])

['0-4', '0-4', '0-4', '0-4', '7-9', '7-9', '11-11']

Note: Please let me know If this is wrong way to answer, will take care of it from next time. Mine Intention was just to give appropriate answer and help others, Not to take others answer blantly etc..

Please comment below, If it is so will remove my answer.

I know ,its bad patch.

Nirmi
  • 356
  • 3
  • 11
0

this returns the results you seem to be looking for. It's not much prettier than what you have, but it works:

#!/usr/bin/python

arr = []
l = [1,2,3,5,6,7,8,9,11,12,13,14,20]
start,counter,i = (0,0,0)

while i < len(l):
    start = i
    counter = 0
    while (i < len(l) - 1) and (l[i+1] == l[i] +1):
        counter += 1
        i += 1
    for x in range(counter+1):
        arr.append("{}-{}".format(l[start], l[start+counter]))
    i += 1

print(arr)

output:

['1-3', '1-3', '1-3', '5-9', '5-9', '5-9', '5-9', '5-9', '11-14', '11-14', '11-14', '11-14', '20-20']
David Jenkins
  • 461
  • 3
  • 8
0

Code

import itertools as it
import collections as ct


# Given
a = [0, 1, 2, 3, 4, 7, 8, 9, 11]
b = [0, 1, 3, 7, 4, 2, 8, 9, 11]                       # unsorted
c = [0, 15, 2, 3, 4, 7, 8, 9, 11, 14]                  # unsorted
d = [0, 15, 2, 3, 4, 7, 8, 9, 11, 14, 2, 4]            # duplicates 


def find_ranges(iterable):
    """Return a defaultdict of ranges."""
    # Find ranges
    sorted_it = sorted(set(iterable))
    keyfunc = lambda i:  sorted_it[i[0]] - i[0]
    ranges = [[item[1] for item in g] 
            for k, g in it.groupby(enumerate(sorted_it), keyfunc)]
    # Out: [[0, 1, 2, 3, 4], [7, 8, 9], [11]]

    # Build dictionary
    dd = ct.defaultdict(int)
    for r in ranges:
        s = "{}-{}".format(min(r), max(r))
        for i in r:
            if i in r:
                dd[i] = s
    return dd

find_ranges(a)

Output

defaultdict(int,
            {0: '0-4',
            1: '0-4',
            2: '0-4',
            3: '0-4',
            4: '0-4',
            7: '7-9',
            8: '7-9',
            9: '7-9',
            11: '11-11'})

Once you have this lookup table, creating a list of ranges is simple:

[find_ranges(b)[i] for i in b]
# ['0-4', '0-4', '0-4', '7-9', '0-4', '0-4', '7-9', '7-9', '11-11']

Details

Furthermore, this function finds ranges for unsorted iterables (b and c), and handles duplicates (d).

assert find_ranges(a) == find_ranges(b)
assert find_ranges(c) == find_ranges(d)

Here we will confirm the results are equivalent for sorted and unsorted inputs. Next we will confirm equivalence for an unsorted input and an input with repeated elements. Finally, we while demonstrate an sample output:

find_ranges(d)

Output

defaultdict(int,
        {0: '0-0',
         2: '2-4',
         3: '2-4',
         4: '2-4',
         7: '7-9',
         8: '7-9',
         9: '7-9',
         11: '11-11',
         14: '14-15',
         15: '14-15'})

Note: the "Find ranges" section is inspired from @Nirmi's post, a great contribution.

pylang
  • 40,867
  • 14
  • 129
  • 121
  • Thank you very much. How can I modify the code to deal with input as `a = [0, 15, 2, 3, 4, 7, 8, 9, 11, 14]`. Sorry for not describing the question clearly. I have update the post. – Chang Ye Sep 08 '17 at 10:14
  • What is your expected output? – pylang Sep 08 '17 at 10:15
  • INPUT: `a = [0, 15, 2, 3, 4, 7, 8, 9, 11, 14] `; OUTPUT: ` {0: '0-0', 15: '14-15', 1: '1-4', 2: '1-4', 3: '1-4', 4: '1-4', 7: '7-9', 8: '7-9', 9: '7-9', 11: '11-11', 14:'14-15'})` – Chang Ye Sep 08 '17 at 10:16
  • `b = sorted(set(a)) ` may be better, for there is duplicate element in the list. Such as `a = [0, 15, 2, 3, 4, 7, 8, 9, 11, 2 , 4]` – Chang Ye Sep 08 '17 at 10:40
  • `b = sorted(set(a)) ` may be better, for there is duplicate element in the list. Such as `a = [0, 15, 2, 3, 4, 7, 8, 9, 11, 2 , 4]` – Chang Ye Sep 08 '17 at 10:40
  • Correct. This answer has been updated. I suggest updating your new requirements in your question. – pylang Sep 08 '17 at 10:54
0

Inspired by this post and @pylang, I found a O(n) solution.


import itertools
import collections

# an example list
l = [100, 107, 0, 1, 2, 3, 5, 6, 10, 11, 65, 66, 68, 68]

class Gap(object):
    def __init__(self, diff):
        self.diff, self.flag, self.prev = diff, [0,1], None
    def __call__(self, elem):
        if self.prev and abs(self.prev - elem) > self.diff:
            self.flag = self.flag[::-1]
        self.prev= elem
        return self.flag[0]


def list_to_ranges_with_gap(raw_list, gap_length = 1):
    """Return a defaultdict of ranges."""

    # Find ranges
    sorted_list = sorted(set(raw_list))
    merged_ranges = [list(g) for k, g in 
                     itertools.groupby(sorted_list, key = Gap(gap_length))]

    # Build dictionary
    list2range = collections.defaultdict(int)
    for r in merged_ranges:
        for i in r:
            list2range[i] = "{}-{}".format(r[0], r[-1])

    return list2range

list_to_ranges_with_gap(l, 10)

defaultdict(int,
            {0: '0-11',
             1: '0-11',
             2: '0-11',
             3: '0-11',
             5: '0-11',
             6: '0-11',
             10: '0-11',
             11: '0-11',
             65: '65-68',
             66: '65-68',
             68: '65-68',
             100: '100-107',
             107: '100-107'})
Chang Ye
  • 1,105
  • 1
  • 12
  • 25