1

Using pandas, given this dictionary which has tuples as keys:

dictionary = {(a,c): 1, (a,d): 3, (b,c): 2, (b,d): 4}

How can I get a dataframe like so?

a b
c 1 2
d 3 4

I thought about using df.at[] to assign the correct values to each row/column location - e.g. df.at[a,c] = 1. However, I'm not clear on how to use the tuple with .at[].

wjandrea
  • 28,235
  • 9
  • 60
  • 81
David
  • 11
  • 2
  • `df.at[key] = value`? Is that what you're looking for? – wjandrea Sep 02 '23 at 14:25
  • What dataframe library are you using, Pandas? Please add the tag for it. BTW, welcome to Stack Overflow! Check out the [tour], and [ask] if you want tips. – wjandrea Sep 02 '23 at 14:27
  • What should go into cells that *aren't* specified by the dictionary? What column dtypes should be used? – Karl Knechtel Sep 02 '23 at 14:29
  • @wjandrea as far as I'm aware, Pandas is the only Python library that uses the name "dataframe" for a type that it creates. – Karl Knechtel Sep 02 '23 at 14:30
  • Hi wjandrea! I'm using Pandas. Thanks for your help, I'll give df.at[key] = value a try. Sorry if it was a basic question, I'm a beginner at coding – David Sep 02 '23 at 14:33
  • 1
    @Karl Polars also has a [`DataFrame`](https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/index.html). Although, if people don't mention it, I assume they're using Pandas, but I like to ask to make sure. – wjandrea Sep 02 '23 at 15:02

2 Answers2

3

I would make a MultiIndex Series using the constructor, then unstack it's outer-level :

dictionary = {("a", "c"): 1, ("a", "d"): 3, ("b", "c"): 2, ("b", "d"): 4}

df = pd.Series(dictionary).unstack(0)

Output :

print(df)

   a  b
c  1  2
d  3  4
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • Should add that if the dtypes were different per column, you'd want a different approach. But this works well where they're all ints. – wjandrea Sep 02 '23 at 15:27
1

You could loop through your dictionary and create a new dictionary of dictionaries, where the outer dictionary's keys are column names and the inner dictionaries' keys are the row indices. To save on a few lines of code, I'm going to use a defaultdict(dict) as the outer dictionary

from collections import defaultdict
import pandas as pd

dictionary = {('a','c'): 1, ('a','d'): 3,
              ('b','c'): 2, ('b','d'): 4}


dd = defaultdict(dict)

for (col_name, row_name), value in dictionary.items():
    dd[col_name][row_name] = value

This results in the following dd:

defaultdict(<class 'dict'>, {'a': {'c': 1, 'd': 3}, 'b': {'c': 2, 'd': 4}})

Finally, use this to create your dataframe:

df = pd.DataFrame.from_dict(dd)

Which gives the desired dataframe:

   a  b
c  1  2
d  3  4
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
  • 1
    Nice! This is a better approach than what OP had in mind; cf. this other SO question: [Creating an empty Pandas DataFrame, and then filling it](/q/13784192/4518341) – wjandrea Sep 02 '23 at 15:38