List of tuples for each pandas dataframe slice

Question

I need to do something very similar to this question: Pandas convert dataframe to array of tuples

The difference is I need to get not only a single list of tuples for the entire DataFrame, but a list of lists of tuples, sliced based on some column value.

Supposing this is my data set:

         t_id  A    B
         ----- ---- -----
    0    AAAA     1   2.0
    1    AAAA     3   4.0
    2    AAAA     5   6.0
    3    BBBB     7   8.0
    4    BBBB     9  10.0
    ...

I want to produce as output:

        [[(1,2.0), (3,4.0), (5,6.0)],[(7,8.0), (9,10.0)]]

That is, one list for 'AAAA', another for 'BBBB' and so on.

I've tried with two nested for loops. It seems to work, but it is taking too long (actual data set has ~1M rows):

    result = []
    for t in df['t_id'].unique():
        tuple_list= []
        
        for x in df[df['t_id' == t]].iterrows():
            row = x[1][['A', 'B']]
            tuple_list.append(tuple(x))
        
        result.append(tuple_list)

Is there a faster way to do it?

score 2 · Answer 1 · answered Aug 15 '21 at 01:00

2

You can groupby column t_id, iterate through groups and convert each sub dataframe into a list of tuples:

[g[['A', 'B']].to_records(index=False).tolist() for _, g in df.groupby('t_id')]
# [[(1, 2.0), (3, 4.0), (5, 6.0)], [(7, 8.0), (9, 10.0)]]

answered Aug 15 '21 at 01:00

Psidom

209,562
33
339
356

It actually works and it is way faster than may for loops, tks. Can you please break down the `for _, g` part? What does `groupby` return? In dataframe documentation it returns only a `DataFrameGroupBy` object. – thatOldITGuy Aug 15 '21 at 11:57
`DataFrameGroupBy` inherits from `BaseGroupBy` which has an `__iter__` method defined: https://github.com/pandas-dev/pandas/blob/126a19d038b65493729e21ca969fbb58dab9a408/pandas/core/groupby/groupby.py#L756. Which means you can iterate through groups, process each separately. – Psidom Aug 15 '21 at 17:22

score 1 · Accepted Answer · answered Aug 15 '21 at 02:25

1

I think this should work too:

import pandas as pd
import itertools


df = pd.DataFrame({"A": [1, 2, 3, 1], "B": [2, 2, 2, 2], "C": ["A", "B", "C", "B"]})

tuples_in_df = sorted(tuple(df.to_records(index=False)), key=lambda x: x[0])
output = [[tuple(x)[1:] for x in group] for _, group in itertools.groupby(tuples_in_df, lambda x: x[0])]
print(output)

Out:

[[(2, 'A'), (2, 'B')], [(2, 'B')], [(2, 'C')]]

answered Aug 15 '21 at 02:25

osint_alex

952
3
16

1

It does work, and was actually faster than [this](https://stackoverflow.com/a/68788041/16505303) answer – thatOldITGuy Aug 15 '21 at 13:33

List of tuples for each pandas dataframe slice

2 Answers2