Pandas Dataframe from matrix-like dictionary where keys are tuples of indices

Question

I have a dictionary whose keys are tuples of the form (i,j) and whose values are matrix entries.

So if you think of a mathematical matrix $A = (a_{i,j})$ then matrix_dict[(i,j)] would give the value of row i and column j.

I would like to have a pandas dataframe where the values of matrix_dict[(i,0)] for i in range 1 to m+1 are the names of the rows, matrix_dict[(0,j)] for j in range 1 to n+1 the names of the columns and all values where none of the tuple indices (i,j) are 0 to be the entries of the df with the corresponding row and column index.

The dictionary would look like this:

matrix_dict = {
    (0, 0): 'RowIndex\ColumnIndex',
    (0, 1): 'Column1',
    (0, 2): 'Column2',
    (1, 0): 'Row1',
    (1, 1): 1,
    (1, 2): 2,
    (2, 0): 'Row2',
    (2, 1): 3,
    (2, 2): 4
}

I thought it would be easy to convert that into a pandas dataframe as the structure already matches in a way, but the solutions I found on here using pd.DataFrame.from_dict are for different problems where the key tuple is supposed to become part of the dataframe or multi-indices.

please provide an explicit [minimal reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) of the input and the matching expected output — mozway, May 26 '23 at 13:18

mozway · Accepted Answer · 2023-05-26T13:38:16.370

If I understood correctly, use pandas.Series and unstack:

dic = {(0, 0): 1, (0, 1): 2, (1, 0): 3, (1, 1): 4, (2, 2): 5}

df = pd.Series(dic).unstack(fill_value=0)

Output:

You can also reindex using m and n:

m, n = 4, 5

df = (pd.Series(dic).unstack(fill_value=0)
        .reindex(index=range(m), columns=range(n), fill_value=0)
     )

Output:

   0  1  2  3  4
0  1  2  0  0  0
1  3  4  0  0  0
2  0  0  5  0  0
3  0  0  0  0  0

updated question:

matrix_dict = {
    (0, 0): 'RowIndex\ColumnIndex',
    (0, 1): 'Column1',
    (0, 2): 'Column2',
    (1, 0): 'Row1',
    (1, 1): 1,
    (1, 2): 2,
    (2, 0): 'Row2',
    (2, 1): 3,
    (2, 2): 4
}

m, n = 2, 2

df = (pd.Series(matrix_dict).unstack(fill_value=0)
        .reindex(index=range(m+1), columns=range(n+1), fill_value=0)
        .set_index(0)
        .pipe(lambda d: d.set_axis(d.iloc[0], axis=1).iloc[1:])
        .rename_axis(index=None, columns=None)
     )

Output:

     Column1 Column2
Row1       1       2
Row2       3       4

Bonus:

df = (pd.Series(matrix_dict).unstack(fill_value=0)
        .reindex(index=range(m+1), columns=range(n+1), fill_value=0)
        .set_index(0)
        .pipe(lambda d: d.set_axis(d.iloc[0], axis=1).iloc[1:])
        .rename_axis(**dict(zip(('index', 'columns'),
                                matrix_dict[(0, 0)].split('\\'))))
     )

Output:

ColumnIndex Column1 Column2
RowIndex                   
Row1              1       2
Row2              3       4

It might be too much asking for another bonus lol, but could you briefly explain the logic behind the '->series -> unstack' approach? I don't quite get how that functions as a series is just ordered in one direction — work flow, May 26 '23 at 14:06
@workflow sure, your input is a 1D structure similar to a Series, so it makes sense to start with that. This gives your a Series with a MultiIndex and 2 levels. Unstacking will automatically group the rows/cols with the same indices in a given level while converting to 2D shape. Just what you wanted. Then there is a bit of reworking if you want to move the headers. Btw, if you only have integers in the final DataFrame, I suggest to add a `.astype(int)` step to ensure having a good dtype to work with ;) — mozway, May 26 '23 at 14:43

score 1 · Answer 2 · answered May 26 '23 at 13:33

This should work:

import pandas as pd

n #from the question
matrix_dict #from the question

df = pd.DataFrame()

for j in range(1,n+1):
    df[matrix_dict[(0,j)]] = [matrix_dict[(i,j)] for i in range(1,m+1)]

df.index = [matrix_dict[(i,0)] for i in range(1,m+1)]

Pandas Dataframe from matrix-like dictionary where keys are tuples of indices

2 Answers2

updated question: