-1

I've got excel/pandas dataframe/file looking like this:

+------+--------+
|  ID  | 2nd ID |
+------+--------+
| ID_1 |  R_1   |
| ID_1 |  R_2   |
| ID_2 |  R_3   |
| ID_3 |        |
| ID_4 |  R_4   |
| ID_5 |        |
+------+--------+

How can I transform it to python dictionary? I want my result to be like:

{'ID_1':['R_1','R_2'],'ID_2':['R_3'],'ID_3':[],'ID_4':['R_4'],'ID_5':[]}

What should I do, to obtain it?

Rafał
  • 51
  • 10
  • 1
    Does this answer your question? [How to group dataframe rows into list in pandas groupby?](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby) – sushanth Aug 17 '20 at 08:42
  • @sushanth - No, because missing values. – jezrael Aug 17 '20 at 08:47
  • https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html . What you are trying to do is reverse of the `explode` in pandas – Vishesh Mangla Aug 24 '20 at 08:01

2 Answers2

0

If need remove missing values for not exist values use Series.dropna in lambda function in GroupBy.apply:

d = df.groupby('ID')['2nd ID'].apply(lambda x: x.dropna().tolist()).to_dict()
print (d)
{'ID_1': ['R_1', 'R_2'], 'ID_2': ['R_3'], 'ID_3': [], 'ID_4': ['R_4'], 'ID_5': []}

Or use fact np.nan == np.nan return False in list compehension for filter non missing values, check also warning in docs for more explain.

d = df.groupby('ID')['2nd ID'].apply(lambda x: [y for y in x if y == y]).to_dict()

If need remove empty strings:

d = df.groupby('ID')['2nd ID'].apply(lambda x: [y for y in x if y != '']).to_dict()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Apply a function over the dataframe over the rows which appends the value to your dict. Apply is not inplace and thus your dictionary would be created.

d = dict.fromkeys(df.ID.unique(), [])

def func(x):
  
  d[x.ID].append(x["2nd ID"])

# will return a series of Nones
df.apply(func, axis = 1)

Edit:

I asked it on Gitter and @gurukiran07 gave me an answer. What you are trying to do is reverse of explode function

s = pd.Series([[1, 2, 3], [4, 5]])

0    [1, 2, 3]
1       [4, 5]
dtype: object

exploded = s.explode()

0    1
0    2
0    3
1    4
1    5
dtype: object

exploded.groupby(level=0).agg(list)

0    [1, 2, 3]
1       [4, 5]
dtype: object
Vishesh Mangla
  • 664
  • 9
  • 20