0

I have a dataframe and want to convert it to a list of dictionaries. I use read_csv() to create this dataframe. The dataframe looks like the following:

             AccountName AccountType  StockName Allocation
          0  MN001       #1           ABC       0.4
          1  MN001       #1           ABD       0.6
          2  MN002       #2           EFG       0.5
          3  MN002       #2           HIJ       0.4
          4  MN002       #2           LMN       0.1 

The desired output:

   [{'ABC':0.4, 'ABD':0.6}, {'EFG':0.5, 'HIJ':0.4,'LMN':0.1}]

I have tried to research on similar topics and used the Dataframe.to_dict() function. I look forward to getting this done. Many thanks for your help!

Steve P
  • 171
  • 1
  • 9
  • You want a list of dictionaries where each dictionary contains the StockName as the key and the Allocation as the value, and separate dictionaries for the different AccountNames? Does your dataframe have duplicate StockName values within the same AccountName? – m13op22 Mar 01 '19 at 22:47
  • yes, I want to have StockName as the key and Allocation as the value for each AcccountName. There is no duplicate StockName value in the same AccountName. Thanks. – Steve P Mar 01 '19 at 23:22

2 Answers2

4
import pandas as pd 
import numpy as np 

d = np.array([['MN001','#1','ABC', 0.4],
                ['MN001','#1','ABD', 0.6],
                ['MN002', '#2', 'EFG', 0.5],
                ['MN002', '#2', 'HIJ', 0.4],
                ['MN002', '#2', 'LMN', 0.1]])  
df = pd.DataFrame(data=d, columns = ['AccountName','AccountType','StockName', 'Allocation'])

by_account_df = df.groupby('AccountName').apply(lambda x : dict(zip(x['StockName'],x['Allocation']))).reset_index(name='dic'))
by_account_lst = by_account_df['dic'].values.tolist()

And the result should be:

print(by_account_lst)
[{'ABC': '0.4', 'ABD': '0.6'}, {'EFG': '0.5', 'HIJ': '0.4', 'LMN': '0.1'}]
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
3

This should do it:

portfolios = []
for _, account in df.groupby('AccountName'):
    portfolio = {stock['StockName']: stock['Allocation'] 
              for _, stock in account.iterrows()}
    portfolios.append(portfolio)

First use the groupby() function to group the rows of the dataframe by AccountName. To access the individual rows (stocks) for each account, you use the iterrows() method. As user @ebb-earl-co explained in the comments, the _ is there as a placeholder variable, because iterrows() returns (index, Series) tuples, and we only need the Series (the rows themselves). From there, use a dict comprehension to create a dictionary mapping StockName -> Allocation for each stock. Finally, append that dictionary to the list of portfolios, resulting in the expected output:

[{'ABC': 0.4, 'ABD': 0.6}, {'EFG': 0.5, 'HIJ': 0.4, 'LMN': 0.1}]

One more thing: if you decide later that you want to label each dict in the portfolios with the account name, you could do it like this:

portfolios = []
for acct_name, account in df.groupby('AccountName'):
    portfolio = {stock['StockName']: stock['Allocation'] 
              for _, stock in account.iterrows()}
    portfolios.append({acct_name: portfolio})

This will return a list of nested dicts like this:

[{'MN001': {'ABC': 0.4, 'ABD': 0.6}},
 {'MN002': {'EFG': 0.5, 'HIJ': 0.4, 'LMN': 0.1}}]

Note that in this case, I used the variable acct_name instead of assigning to _ because we actually will use the index to "label" the dicts in the portfolios list.

J. Taylor
  • 4,567
  • 3
  • 35
  • 55
  • What does the "_," mean? I am a Python beginner and hope to learn as much as I can. Thanks a lot. – Steve P Mar 01 '19 at 23:25
  • @SteveP it stands for an unused iteration variable. The `.iterrows()` method returns an iterable of `(index, Series)` pairs from the DataFrame. If you want to iterate over this iterable you can iterate over each pair, but he wants access to the `Series` object so he iterates over both the index and the `Series`. He doesn't need to do anything with the index, though, so the `_` is Python syntax for "I want to iterate over this, but I don't deign to give it a name because I am not going to use it". – call-in-co Mar 01 '19 at 23:33