1

I have a nested dictionary of defaultdict(dict) whose sub dict have int keys and lists (list of ints) as values,

'A' = {2092: [1573], 2093: [1576, 1575], 2095: [1577], 2097: [1574]}
'B' = {2098: [1], 2099: [2, 3], 2101: [4], 2102: [5]}
'C' = {2001: [6], 2003: [7, 8], 2008: [9], 2009: [10]}

I want to continuously join two sub key values (lists) if their corresponding sub keys are consecutive meaning their difference is less than or equal to a pre-defined distance, e.g. the difference between two consecutive keys are less than or equal to 2 or 3, ... e.g. when setting the distance to 2, and put the joined lists into another list, this last list will look like,

[1573, 1576, 1575, 1577, 1574]
[1, 2, 3, 4, 5]
[6, 7, 8]
[9, 10]

For A, 2092, 2093, 2095, 2097 are consecutive, since their differences are <= 2, then their values are put into one list [1573, 1576, 1575, 1577, 1574]. For C, 2001 and 2003 are joined into [6,7,8] since their difference is 2, 2003 and 2008 are not joined, since their difference is 5.

based on Detecting consecutive integers in a list

The following code can only work when the difference between two keys is 1.

results = []
for key, sub_dict in d.items():
    sub_dict_keys = sorted(sub_dict.keys())
    for k, g in groupby(enumerate(sub_dict_keys), lambda ix: ix[0] - ix[1]):
        consecutive_keys = list(map(itemgetter(1), g))
        val_list = []

        for dict_key in consecutive_keys:
            val_list.extend(sub_dict[dict_key])

        results.append(val_list)

print(results)

I am wondering how to make the code account for an arbitrary distance.

daiyue
  • 7,196
  • 25
  • 82
  • 149

2 Answers2

1

How about using:

dist = 2

results = []
for sub_dict in d.values():
    sub_dict_keys = sorted(sub_dict.keys())
    l = []
    for k in sub_dict_keys:
        if l and k > prev_key + dist:
            results.append(l)
            l = []
        l.extend(sub_dict[k])
        prev_key = k
    if l:
        results.append(l)

print(results)

It's a lot cleaner with no need to import modules, but doesn't work well if any of the lists are empty (i.e. they won't get appended to result).

yinnonsanders
  • 1,831
  • 11
  • 28
0

In this line

for k, g in groupby(enumerate(sub_dict_keys), lambda ix: ix[0] - ix[1]):

you group the keys of the subdict by the difference between their index in the sorted order and their value. So these grouping keys for 'A' are then

  • 0 - 2092 = -2092
  • 1 - 2093 = -2092
  • 2 - 2095 = -2093
  • 3 - 2097 = -2094

Those keys match for the first 2 values, therefore they are grouped together as one and the print(results) gives

[1573, 1576, 1575], [1577], [1574]

for 'A'.

I don't think this is what you want. Since you always need to compare two consecutive items out of a subdict, itertools.groupby won't help you since it can only generate a grouping key per one item in the iterator, not taking account those in the neighborhood.

Jeronimo
  • 2,268
  • 2
  • 13
  • 28