0

I have a dataframe with multiple columns and I would like to convert dataframe for only two columns into a python dictionary.

This is an example of the DataFrame:

    ID   Subject   Reference  
0   A    Elec202   23232
1   A    Comp101   12456
2   B    E2        54235
3   B    Comp222   56654
4   C    Comp123   54467
5   D    E1        21345
6   D    Elec102   85464
7   D    Comp295   23438

The desired output:

subject_details = {"A": ("Elec202", "Comp101"),
                 "B": ("E2", "Comp222"),
                 "C": ("Comp123"),
                 "D": ("E1", "Elec102", "Comp295")
                  }
B A C H A S H
  • 126
  • 1
  • 9

2 Answers2

1

Use groupby and convert each group to tuple with unique values:

# I don't know which one is to be preferred here regarding to performance:
dict(df.groupby('ID')['Subject'].unique().apply(tuple))
#or
dict(df.groupby('ID')['Subject'].apply(lambda x: tuple(pd.unique(x))))

or use dict comprehension:

dic = {k:tuple(set(v)) for k,v in df.groupby('ID')['Subject']}
print(dic)
{'A': ('Elec202', 'Comp101'),
 'B': ('E2', 'Comp222'),
 'C': ('Comp123',),
 'D': ('E1', 'Elec102', 'Comp295')}

Used Input:

  ID  Subject  Reference
0  A  Elec202      23232
1  A  Comp101      12456
2  B       E2      54235
3  B  Comp222      56654
4  C  Comp123      54467
5  D       E1      21345
6  D  Elec102      85464
7  D  Comp295      23438
8  D       E1      22222
Rabinzel
  • 7,757
  • 3
  • 10
  • 30
  • Thanks, that worked for me. But there is one issue, some IDs can have duplicate subjects in the dataframe. How do I make the values to be unique for each ID? For example E can have two duplicate values as shown below: { 'E': ('E1', 'Elec102', 'Comp292', 'E1') } – B A C H A S H Sep 30 '22 at 14:39
0

My preferred method

There’s a variety of approaches overviewed in the thread that you can explore.