group list of tuples to list of tuples with lists in python

Question

I have a list of tuples, each tuple has patient and visit, patient can have several visits

I want to get list of patient and for every patient the list of their visits

for example

[(patient1, visit), (patient2, visit), (patient1, visit)]

To

[(patient1, [visit, visit]), (patient2, [visit])]

I tried javascript's reduce function approach, but I can't really understand how I can do it in python

should your expected output be `[('patient', ['visit', 'visit', 'visit'])]`? — It_is_Chris, Oct 26 '21 at 13:10
This looks more like it should be a dictionary where patient is the key and the associated value is a list of visits. — Matthias, Oct 26 '21 at 13:10
@Matthias yes, it can be a dictionary, but I wonder how can I do that elegantly — Israel kusayev, Oct 26 '21 at 13:11
You can use itertools.groupby. See https://stackoverflow.com/questions/773/how-do-i-use-itertools-groupby — DarrylG, Oct 26 '21 at 13:11
@DarrylG I don't think groupby seems like the right approach. can you provide an example of how to use it here? I tried but couldn't figure it out. — rv.kvetch, Oct 26 '21 at 13:22

user2390182 · Answer 1 · 2021-10-26T13:45:33.490

The defaultdict approach is the standard way and has linear complexity. You can also just use a common dict and dict.setdefault

d = {}
for patient, visit in data:
    d.setdefault(patient, []).append(visit)
[*d.items()]
# [('patient1', ['visit', 'visit']), ('patient2', ['visit'])]

For a one-line approach (excluding imports) - albeit only log-linear, you can use itertools.groupby:

from itertools import groupby
from operator import itemgetter as ig

[(k, [*map(ig(1), g)]) for k, g in groupby(sorted(data), key=ig(0))]
# [('patient1', ['visit', 'visit']), ('patient2', ['visit'])]

Some useful docs:

score 0 · Answer 2 · answered Oct 26 '21 at 13:13

0

You can use a collections.defaultdict in the following way:

from collections import defaultdict

d = defaultdict(list)
for patient, visit in data:
    d[patient].append(visit)

answered Oct 26 '21 at 13:13

a_guest

34,165
12
64
118

I want the `d` to contain the patient itself – Israel kusayev Oct 26 '21 at 13:14
1

@Israelkusayev You can then use `list(d.items())` to get the format you describe in your question. – a_guest Oct 26 '21 at 13:19
@Israelkusayev `d` does contain the patient. It's the key of each entry. @Matthias 's comment points out that a dict matches your data better, because it guarantees that the patient ids will be unique. The list of tuples that you say you want would not prevent you from having the same patient twice, and so you would have to code your own check to guarantee that when you want to update the data. – BoarGules Oct 26 '21 at 13:19
This is the right approach. In fact, in the latter result from above, that is actually *directly convertible* to a dict. That is, you can pass that list of key-value pairs to `dict()` and it will happily accept it, because the end data is the same, more or less. – rv.kvetch Oct 26 '21 at 13:23

DarrylG · Answer 3 · 2021-10-26T13:40:00.107

0

Example Using itertools.groupby

from itertools import groupby

# Example data
records = [('bill', '1/1/2021'), ('mary', '1/2/2021'), ('janet', '1/3/2021'), ('bill', '3/5/2021'), ('mary', '4/25/2021')]

# Group visits by patient names
g = groupby(sorted(records), lambda kv: kv[0])  # Group based upon first element of tuples (i.e. name)
                                                # Sort so names are adjacent for groupby
    
# Using list comprehension on groupings to provided desired tuples
result =  [(name, [d[1] for d in visit]) for name, visit in g]

Above code as a one-liner

result = [(name, [d[1] for d in visit]) for name, visit in  groupby(sorted(records), lambda kv: kv[0])]

edited Oct 26 '21 at 13:40

answered Oct 26 '21 at 13:32

DarrylG

16,732
2
17
23

Note that this requires the `partient`'s class to implement a total ordering. For strings, no problem, but perhaps these are more complex objects such as a frozen dataclass. The dict approach only requires these objects to be hashable which seems much more natural for patient data than to define a total ordering. – a_guest Oct 26 '21 at 13:57

group list of tuples to list of tuples with lists in python

3 Answers3