0

Currently, I am working on grouping transactions based on a product ID. The problem is that a product can contain various alternate IDs that can be located in two columns. For example, a product can look like this in the dataset;

requested ID Item ID
123t-34 4s4532t
:------------ -------:
4s4532t. 123t-34
:------------ -------:
123t-34. 4s4532t

This is an example of a much bigger dataset. Any input on how I could group them to look something similar to this;

Id. Alternate Ids
123t-34 (4s4532t,123t-34)

This is my attempt to solve the problem. data_dict = data.groupby('requested_id')['Item ID'].agg('unique')

But this only solved half of the grouping based on one column.

1 Answers1

0

You can use pandas.DataFrame.itertuples to build a tuple from two or more columns.

Try this :

(
    data.assign(AlternateIDs= list(df[df.columns[::-1]].itertuples(index=False, name=None)))
        .rename(columns={'requested ID': 'Id.'})
        .drop(columns='Item ID')
)

# Output :

        Id.         AlternateIDs
0   123t-34   (4s4532t, 123t-34)
1  4s4532t.  (123t-34, 4s4532t.)
2  123t-34.  (4s4532t, 123t-34.)
Timeless
  • 22,580
  • 4
  • 12
  • 30