0

I am calculating % difference among all values and then creating group . As an output I am getting combinations of 2 values but I want combine all values in one group which are less than 30% of each other.

Working code is given below


from itertools import combinations

def pctDiff(A,B):
    return abs(A-B)*200/(A+B)

def main():
    dict2={}
    dict ={'acct_number':10202,'acct_name':'abc','v1_rev':3000,'v2_rev':4444,'v4_rev':234534,'v5_rev':5665,'v6_rev':66,'v7_rev':66,'v3_rev':66}
    vendors_revenue_list =['v1_rev','v2_rev','v3_rev','v4_rev','v5_rev','v6_rev','v7_rev','v8_rev']
    #prepared list of vendors
    for k in vendors_revenue_list:
        if k in dict.keys():
            dict2.update({k: dict[k]})

    print(dict2)
    #provides all possible combination
    for a, b in combinations(dict2, 2):
        groups = [(a,b) for a,b in combinations(dict2,2) if pctDiff(dict2[a],dict2[b]) <= 30]

    print(groups)

output

[('v2_rev', 'v5_rev'), ('v3_rev', 'v6_rev'), ('v3_rev', 'v7_rev'), ('v6_rev', 'v7_rev')]

Desired output should be

[('v2_rev', 'v5_rev'), ('v3_rev', 'v6_rev','v7_rev')]

enter image description here

pbh
  • 186
  • 1
  • 9

5 Answers5

1

You can use binary search function on the sorted values to get the range of keys that correspond to groups formed of values that are within 30% of a reference value (for each value used as reference point):

D = {"A":100, "B":110, "C":120, "D":150, "E":160, "F":250}

keys  = sorted(D,key=D.get)  # keys in value order
*values, = map(D.get,keys)   # ordered values (for binary search)

from bisect import bisect_right,bisect_left
maxPct  = 30
ratio   = (200+maxPct)/(200-maxPct) # to compute +30% from start value
groups  = set()                     # groups is a set to avoid duplicates
for i,refValue in enumerate(values):
    start = bisect_left(values, refValue/ratio)  # values below
    end   = bisect_right(values, refValue*ratio) # values above
    if end-start<2: continue                     # at least 2 values
    groups.add(tuple(keys[start:end]))           # add group
    

output:

print(groups)
{('A', 'B', 'C'), ('C', 'D', 'E'), ('A', 'B', 'C', 'D', 'E')}
Alain T.
  • 40,517
  • 4
  • 31
  • 51
  • For example: A:100, B:110, C:120, D:150, E:160 I need output as (a,b,c) (d,e,c) and (c,a,b,d,e) . Then out of this I will pick group which has most elements in it . I have added snapshot in my actual postto show How I am reaching to this point . – pbh Mar 09 '23 at 19:50
  • Hello @Alain in your bisect example we are not getting one group which has all values in it (a,b,c,d,e) . We need this because C is within within 30% of all other values – pbh Mar 09 '23 at 20:27
  • I adjusted my answer to build groups around reference values (which I hadn't understood initially) – Alain T. Mar 09 '23 at 21:57
  • Thanks @Alain your solution almost resolving all my scenario . I am testing with few examples and handling corner scenarios – pbh Mar 09 '23 at 22:15
  • yesterday you gave another solution also which was working perfectly find for me.Let me add that I just wanted to understand what does this step is doing `groups = [tuple(g) for _, (*g,) in groupby(keys, lambda _: next(G)) if len(g) > 1]` – pbh Mar 10 '23 at 20:41
  • It was grouping each key based on the corresponding value in the G iterator (accumulate). With `_,(*g,)` the output of groupby() is directly converted to a list (in `g`) which lets me get the length and only select the ones with more than one item. I replaced that previous solution because it wasn't producing some of the expected output (namely A,B,C,D,E) which required a different tactic. – Alain T. Mar 10 '23 at 22:17
0

I couldn't think of a way to do it with combinations, so I opted to just nest a loop and append values to a tuple which satisfy the condition:

    #provides all possible combination
    added = [False]*len(vendors_revenue_list)     # keep track of already added values
    groups = []
    for i, a in enumerate(dict2.items()):
        if added[i]:
            continue
        tup = (a[0],)        # initial element to tuple
        for j, b in enumerate(list(dict2.items())[i+1:], start=i+1):
            if added[j]:
                continue
            if (pctDiff(a[1], b[1]) <= 30):
                tup = (*tup, b[0])               # extend tuple with new value
                added[i], added[j] = True, True  # mark values as added
        if len(tup) > 1:                         # only append if a different match is found
            groups.append(tup)

    print(groups)

Output:

[('v2_rev', 'v5_rev'), ('v3_rev', 'v6_rev', 'v7_rev')]
B Remmelzwaal
  • 1,581
  • 2
  • 4
  • 11
0

I came out with this solution, without the itertools package.

def get_keys_by_value(d, v):
    return [k for k, val in d.items() if val == v]

revenue_list = sorted([dict[k] for k in vendors_revenue_list if k in dict])

results=[]
lastpctdiff=-1
current_group=set()
for i in range(len(revenue_list)-1):
    
    pctdiff = pctDiff(revenue_list[i+1],revenue_list[i])
    
    if pctdiff < 30:

        if pctdiff != lastpctdiff:
            current_group = set()

        current_group.update(get_keys_by_value(dict,revenue_list[i]))
        current_group.update(get_keys_by_value(dict,revenue_list[i+1]))

        if current_group not in results:
            results.append(current_group)
print(results)
[{'v6_rev', 'v7_rev', 'v3_rev'}, {'v2_rev', 'v5_rev'}]
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 12 '23 at 15:36
0

I think you might want to use the sliding window algorithm (see Rolling or sliding window iterator?, for example). First, get a sorted list of the revenues, keeping the associations with their enterprises, then for each result from the sliding window algorithm calculate the percentage difference between the upper and lower item in that result and return the result if it is less than the 30%.

Here is some illustrative code:

## includes code adapted from https://stackoverflow.com/a/6822773/131187

def percent_diff(a,b):
    return abs(100*(a-b)/a) < 0.3

## original data
d ={'acct_number':10202,'acct_name':'abc','v1_rev':3000,'v2_rev':4444,'v4_rev':234534,'v5_rev':5665,'v6_rev':66,'v7_rev':66,'v3_rev':66}

## dictionaries lack order, so extract the required items for subsequent use as a list of tuples, 
## sorted by revenue

required_items = [key for key in d.keys() if '_rev' in key]

rev_items = [(d[_], _) for _ in required_items]
rev_items.sort()
print(rev_items )

n = len (rev_items )
print(n, ' revenue items')

## now do the sliding window carry-on
## notice that revenue precedes identification in each tuple, for sorting purposes

seq = range(n)

## making the calculation for window size three only <<----
window_size = 3

## actual sliding window
print ('Taking ', window_size, ' at a time')
for i in range(len(seq) - window_size + 1):
    result = seq[i: i + window_size]
    low_index, high_index = min(result), max(result)
    print('indices:', low_index, high_index, end='' )
    low, high = rev_items[low_index][0], rev_items[high_index][0]
    print (' values:', low, high, end='')
    if percent_diff(low, high):
        print (' within')
    else:
        print(' outside')
Bill Bell
  • 21,021
  • 5
  • 43
  • 58
0

Solution which is handling all my scenarios

from itertools import groupby,accumulate

def pctDiff(A,B):
    return abs(A-B)*200/(A+B)

def main():
    D={}
    dict ={'acct_number':10202,'acct_name':'abc','v1_rev':100,'v2_rev':110,'v4_rev':2,'v5_rev':200,'v6_rev':210,'v7_rev':60000000,'v3_rev':2000}
    # dict ={'acct_number':10202,'acct_name':'abc','v1_rev':200,'v2_rev':210,'v4_rev':2,'v5_rev':200,'v6_rev':210,'v7_rev':60000000,'v3_rev':200}
    #dict = {'acct_number': 10202, 'acct_name': 'abc', 'v1_rev': 100, 'v2_rev': None, 'v4_rev': None, 'v5_rev': None,'v6_rev': None, 'v7_rev': None, 'v3_rev': None}
    #dict = {'acct_number': 10202, 'acct_name': 'abc', 'v1_rev': 100,'v2_rev':110,'v3_rev':300}
    vendors_revenue_list =['v1_rev','v2_rev','v4_rev','v5_rev','v6_rev','v8_rev' ,'v3_rev', 'v7_rev']
    #prepared list of vendorsof

    for k in vendors_revenue_list:
        if k in dict.keys() and dict[k] is not None:
            D.update({k: dict[k]})
    print(f'D {D}')
    find_winner_percentage_based(D)


def pctDiff(A, B):
    return abs(A - B) * 200 / (A + B)

def find_winner_percentage_based(D):
    keys = sorted(D, key=D.get)  # sorting keys in value ascending  order
    print(f'keys {keys}')
    *values, = map(D.get, keys)  # ordered values (for binary search)
    print(f'values {values}')
    if len(keys) ==1 : # only one vendor has provided data
        groups=keys
    else:
        for z in zip(values[:1] + values, values):
            print(z)
        print('1********')


        from itertools import groupby, accumulate

        G = accumulate(pctDiff(*z) >= 30 for z in zip(values[:1] + values, values)) # zip returns  iterator that generates tuples of length
        #next returns  the next item from the iterator.
        #Underscore is a Python convention to name an unused variable
        #Syntax: itertools.groupby(iterable, key_func)
        groups = [tuple(g) for _, (*g,) in groupby(keys, lambda _: next(G)) if len(g) > 1]

        print('3********')

    print(groups)
    if len(groups)> 1 :
        result=groups.pop()


    print(f'result {result}')
    print('4********')
    print(list(result))
    #keys=sorted(k for k in D if k in result) for now commenting it we don need this
    *values, = map(D.get, result)  # ordered values (for binary search)
    print('6********')
    print(values.pop())


















if __name__ == '__main__':
   main()

only line which I am trying to understand is

groups = [tuple(g) for _, (*g,) in groupby(keys, lambda _: next(G)) if len(g) > 1]

        print('3********')
pbh
  • 186
  • 1
  • 9