I want to merge a list in to range, but keeping the original order. Meanwhile with custom gap support.
for example, when input list [0, 1, 3, 7, 4, 2, 8, 9, 11, 11]
, it is expected to retrun a list of range, ["0-4", "0-4", "7-9", "0-4", "0-4", "0-4", "7-9", "7-9", "11-11", "11-11"]
.
def fun(a_list, gap_length=0):
return a_list_of_range
# from
# [0, 1, 3, 7, 4, 2, 8, 9, 11, 11]
# to
# ["0-4", "0-4", "7-9", "0-4", "0-4", "0-4", "7-9", "7-9", "11-11", "11-11"]
# or to
# {0:"0-4", 1:"0-4", 2:"0-4", 3:"0-4", 4:"0-4", 7:"7-9", 8:"7-9", 9:"7-9", 10:"11-11"}
There is a similar question on stackoverflow, but all the answers can't return range in the corresponding order.
What is your solution?
I wrote a ugly function to solve the problem, but the speed is terrible. The function below support custom gap length for merging list into range.
def to_ranges_with_gap(input_list, gap_len=20):
"""list into range with gap"""
loc2range = {}
input_list = sorted(set(input_list))
start_loc = input_list[0]
stop_loc = input_list[0]
range_loc_list = []
for element in input_list:
if element < stop_loc + gap_len:
range_loc_list.append(element)
stop_loc = element
else:
for loc in range_loc_list:
loc2range[loc] = "{}-{}".format(start_loc, stop_loc)
start_loc = element
stop_loc = element
range_loc_list = [element]
for loc in range_loc_list:
loc2range[loc] = "{}-{}".format(start_loc, stop_loc)
return loc2range
Can you show me a better way to do it?
What dose the list looks like?
The list is:
- duplicate
- unsorted
- not continuous
- huge amount of elements. billions of digits span from 0 to 10^10, thus speed matters.
What's the purpose of repeating the ranges in your result list? You could probably write a more elegant solution without requirement for that quirk. – timgeb
For example if I want to deal with the dataframe below, and try to group age range to calculate the median height.
Age Gender Height
2 M 30
4 M 60
2 M 33
3 F 50
20 M 180
22 F 166
40 F 150
33 M 172
...
I hope to get such result. And the age column the the list
mentioned above.
2-5 M 40.5
2-6 F 50.9
10-25 M 150.8
...
Thus, it will be better if I can merge the dataframe directly, without generating an mapper and remap it to the dataframe again.