How to efficiently convert a subdictionary into matrix in python

Question

I have a dictionary like this:

{'test2':{'hi':4,'bye':3}, 'religion.christian_20674': {'path': 1, 'religious': 1, 'hi':1}}

the value of this dictionary is itself a dictionary.

what my output should look like:

how can I do that efficiently?

I have read this post, which the shape of matrix is different from mine.

this one was closest to my case, but it had a set inside the dictionary not another dictionary.

the thing that is different in my question is that I want also conver the value of the inside dictionary as the values of the matrix.

I was thinking something like this:

doc_final =[[]]
for item in dic1:
    for item2, value in dic1[item]:
        doc_final[item][item2] = value

but it wasnt the correct way.

Thanks for your help :)

Try using pandas - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html — dmitryro, Nov 25 '18 at 03:50
@dmitryro thank you for helping me out:). the problem is that, my dictionary has a nested dictionary which I want every item of the nested dictionary become a new row. making five rows out of that is the part I stuck in :|. in the link you shared is for the case that a dictionary has set and so for example it drives 2 rows out of my example but I am trying to do 5 rows — sariii, Nov 25 '18 at 17:19

The Pineapple · Answer 1 · 2018-11-25T04:02:04.407

2

Using the pandas library you can easily turn your dictionary into a matrix.

Code:

import pandas as pd

d = {'test2':{'hi':4,'bye':3}, 'religion.christian_20674': {'path': 1, 'religious': 1, 'hi':1}}
df = pd.DataFrame(d).T.fillna(0)

print(df)

Output:

                          bye   hi  path  religious
test2                     3.0  4.0   0.0        0.0
religion.christian_20674  0.0  1.0   1.0        1.0

edited Nov 25 '18 at 04:02

answered Nov 25 '18 at 03:55

The Pineapple

567
4
16

@pineapple thanks for the answer, but its not in the shape I would like to have. Actually I want each column of each item be in the new row. from my desired output shown above, I have 5 rows; each value of the nested dictionary should be converted to new row. do you have any idea how to do that? – sariii Nov 25 '18 at 16:51
1

@sariii It looks like tel beat me to it. If you have any other questions, feel free to ask. – The Pineapple Nov 25 '18 at 21:52

tel · Accepted Answer · 2018-11-25T18:31:24.123

There does not seem to be any built in way in Pandas or Numpy to split up your rows like you want. Happily, you can do so with a single dictionary comprehension. The splitsubdicts function shown below provides this dict comprehension, and the todf function wraps up the whole conversion process:

def splitsubdicts(d):
    return {('%s_%d' % (k0, i + 1)):{k1:v1} for k0,v0 in d.items() for i,(k1,v1) in enumerate(v0.items())}

def todf(d):
    # .fillna(0) replaces the missing data with 0 (by default NaN is assigned to missing data)
    return pd.DataFrame(splitsubdicts(splitsubdicts(d))).T.fillna(0)

You can use todf like this:

d = {'Test2': {'hi':4, 'bye':3}, 'religion.christian_20674': {'path': 1, 'religious': 1, 'hi':1}}
df = todf(d)
print(df)

Output:

                              bye   hi  path  religious
Test2_1_1                     0.0  4.0   0.0        0.0
Test2_2_1                     3.0  0.0   0.0        0.0
religion.christian_20674_1_1  0.0  0.0   1.0        0.0
religion.christian_20674_2_1  0.0  0.0   0.0        1.0
religion.christian_20674_3_1  0.0  1.0   0.0        0.0

If you actually want a Numpy array, you can easily convert the dataframe:

arr = df.values
print(arr)

Output:

[[0. 4. 0. 0.]
 [3. 0. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [0. 1. 0. 0.]]

You can also convert the dataframe to a structured array instead, which lets you keep your row and column labels:

arr = df.to_records()
print(arr.dtype.names)
print(arr)

Output:

('index', 'bye', 'hi', 'path', 'religious')
[('Test2_1_1', 0., 4., 0., 0.)
 ('Test2_2_1', 3., 0., 0., 0.)
 ('religion.christian_20674_1_1', 0., 0., 1., 0.)
 ('religion.christian_20674_2_1', 0., 0., 0., 1.)
 ('religion.christian_20674_3_1', 0., 1., 0., 0.)]

Edit: explanation of `splitsubdicts`

The nested dictionary comprehension used in splitsubdicts might seem kind of confusing. Really it's just a shorthand for writing nested loops. You can expand the comprehension out in a couple of for loops as so:

def splitsubdicts(d):
    ret = {}

    for k0,v0 in d.items():
        for i,(k1,v1) in enumerate(v0.items()):
            ret['{}_{}'.format(k0, i + 1)] = {k1: v1}

    return ret

The values returned by this loop-based version of splitsubdicts will be identical to those returned by the comprehension-based version above. The comprehension-based version might be slightly faster than the loop-based version, but in practical terms it's not the kind of thing anyone should worry about.

thanks for the answer, but its not in the shape I would like to have. Actually I want each column of each item be in the new row. from my desired output shown above, I have 5 rows; each value of the nested dictionary should be converted to new row. do you have any idea how to do that? — sariii, Nov 25 '18 at 16:51
@sariii Ooooh, now I get what you were trying to do. I'll see what I can do — tel, Nov 25 '18 at 16:56
@sariii Okay, I've added a `splitsubdicts` function that splits up the rows like how you wanted, and a `todf` function that wraps the whole dict-to-dataframe conversion process. The output now matches your example exactly (aside from the column sort order). Is this what you were looking for? — tel, Nov 25 '18 at 18:11
thank you somuch for taking the time. yea it is the same thing i was looking for. I have a question about part of your implementation but I will ask later :) . thanks again — sariii, Nov 25 '18 at 18:15
@sariii I could be wrong, but I'm assuming you're talking about the nested dict comprehension in `splitsubdicts`. I added an explanatory note about the syntax at the end of the question. — tel, Nov 25 '18 at 18:32

How to efficiently convert a subdictionary into matrix in python

2 Answers2

Edit: explanation of splitsubdicts

Edit: explanation of `splitsubdicts`