How do I group by with name, preserve keys and names using itertools?

Question

I am working with a data set that is a simple SQL Query that fetches the desired rows.

[(2, 5, 'JOHN K', 'YAHOO'), (2, 6, 'AARON M', 'YAHOO'), (2, 7, 'NICK C', 'YAHOO'), (1, 2, 'CELESTE G', 'GOOGLE'), (1, 3, 'RICH M', 'GOOGLE'), (1, 4, 'SANDEEP C', 'GOOGLE')]

What I have so far that yields the grouping without keys -

import itertools
import operator


def accumulate(rows):
    # itemgetter fetches and groups them by company name(3)
    it = itertools.groupby(rows, operator.itemgetter(3))
    k = {}
    for key, subiter in it:
        k[key] = ';'.join(item[2] for item in subiter)
    return k


if __name__ == '__main__':

    rows = [(2, 5, 'JOHN K', 'YAHOO'), (2, 6, 'AARON M', 'YAHOO'), (2, 7, 'NICK C', 'YAHOO'), (1, 2, 'CELESTE G', 'GOOGLE'), (1, 3, 'RICH M', 'GOOGLE'), (1, 4, 'SANDEEP C', 'GOOGLE')] 
    groupedby = (accumulate(rows))

    print(groupedby)

Output -

{'YAHOO': 'JOHN K;AARON M;NICK C', 'GOOGLE': 'CELESTE G;RICH M;SANDEEP C'}

Desired Output preserve the keys and still do the grouping -

{('YAHOO,2'): '(JOHN K,5);(AARON M,6);(NICK C,7)', ('GOOGLE,1'): '(CELESTE G,2);(RICH M,3);(SANDEEP C,4)'}

I am open to some other data structure that is not comma separated, using pipes or may be a tuple.

for key, subiter in it:
    k[key, ] = ';'.join(item[2] for item in subiter)

Any help is appreciated!

d.b · Accepted Answer · 2022-10-25T20:22:42.983

2

# 1
ans = {}
for a, b, c, d in arr:
    ans.setdefault("".join(["(", ",".join([d, str(a)]), ")"]), []).\
        append("".join(["(", ",".join([c, str(b)]), ")"]))
{k: ";".join(v) for k, v in ans.items()}
# {'(YAHOO,2)': '(JOHN K,5);(AARON M,6);(NICK C,7)',
#  '(GOOGLE,1)': '(CELESTE G,2);(RICH M,3);(SANDEEP C,4)'}

# 2
ans = {}
for el in arr:
    a, b, c, d = el
    key = "".join(["(", ",".join([d, str(a)]), ")"])
    val = "".join(["(", ",".join([c, str(b)]), ")"])
    if ans.get(key) is None:
        ans[key] = [val]
    else:
        ans[key].append(val)

for k, v in ans.items():
    ans[k] = ";".join(v)

ans
# {'(YAHOO,2)': '(JOHN K,5);(AARON M,6);(NICK C,7)',
#   '(GOOGLE,1)': '(CELESTE G,2);(RICH M,3);(SANDEEP C,4)'}

# I would just do this
ans = {}
for a, b, c, d in arr:
    ans.setdefault((d, a), []).append((c, b))
ans
# {('YAHOO', 2): [('JOHN K', 5), ('AARON M', 6), ('NICK C', 7)],
#  ('GOOGLE', 1): [('CELESTE G', 2), ('RICH M', 3), ('SANDEEP C', 4)]}

# Data
arr = [(2, 5, 'JOHN K', 'YAHOO'),
       (2, 6, 'AARON M', 'YAHOO'),
       (2, 7, 'NICK C', 'YAHOO'),
       (1, 2, 'CELESTE G', 'GOOGLE'),
       (1, 3, 'RICH M', 'GOOGLE'),
       (1, 4, 'SANDEEP C', 'GOOGLE')]

edited Oct 25 '22 at 20:22

answered Oct 25 '22 at 16:29

d.b

32,245
6
36
77

Super Pythonic solution but could you please provide comments on what each line is doing so your thought process can be understood? Thanks! – ThinkCode Oct 25 '22 at 19:27
Thank you so much for the multiple approaches. Can we limit these to 2 persons per company? Say, Google has 3 employees in the list, then we get {('GOOGLE', 1): [('CELESTE G', 2), ('RICH M', 3)], ('GOOGLE', 1):[('SANDEEP C', 4)]} ? Thank you! – ThinkCode Oct 26 '22 at 02:43
1

@ThinkCode. That wouldn't be possible as written as each key must be unique, and you're using the same key twice. You have to add a check for if the length of the list is >1, and either throw away the employees after the second, or make up a new subkey, like (('GOOGLE', 1), 1) and (('GOOGLE', 1), 2). Probably better to make the full list here and handle any subsequent data splitting per company later in your processing. – nigh_anxiety Oct 28 '22 at 04:48
Thank you, yeah I am handling based on length. Was wondering if there is a pythonic solution for that. – ThinkCode Oct 28 '22 at 13:18

nigh_anxiety · Answer 2 · 2022-10-25T16:43:47.597

Here's a solution using groupby with the full key you want as a tuple, and putting the matches into a list of tuples, instead of a semicolon-delimited string.

import itertools
import operator


def accumulate(rows):
    # use lambda function to build a groupby tuple from pieces (3,0)
    it = itertools.groupby(rows, lambda x: (x[3], x[0]))
    k = {}
    for key, subiter in it:
        # for the specified key, use list comprehension to create tuples of desired elements from each row in the group.
        k[key] = [(item[2], item[1]) for item in subiter]
    return k


if __name__ == "__main__":

    rows = [
        (2, 5, "JOHN K", "YAHOO"),
        (2, 6, "AARON M", "YAHOO"),
        (2, 7, "NICK C", "YAHOO"),
        (1, 2, "CELESTE G", "GOOGLE"),
        (1, 3, "RICH M", "GOOGLE"),
        (1, 4, "SANDEEP C", "GOOGLE"),
    ]
    groupedby = accumulate(rows)

    print(groupedby)

output:

{
 ('YAHOO', 2): [('JOHN K', 5), ('AARON M', 6), ('NICK C', 7)],
 ('GOOGLE', 1): [('CELESTE G', 2), ('RICH M', 3), ('SANDEEP C', 4)]
}

How do I group by with name, preserve keys and names using itertools?

2 Answers2