1

I want to reformat my nested dictionary before outputting it into csv. My nested dictionary:

review = {'Q1': {'Question': 'question wording','Answer': {'Part 1': 'Answer part one', 'Part 2': 'Answer part 2'} ,'Proof': {'Part 1': 'The proof part one', 'Part 2': 'The proof part 2'}},
      'Q2': {'Question': 'question wording','Answer': {'Part 1': 'Answer part one', 'Part 2': 'Answer part 2'} ,'Proof': {'Part 1': 'The proof part one', 'Part 2': 'The proof part 2'}}}

So far I have tried:

my_df = pd.DataFrame(review)
my_df = my_df.unstack()

and get part way:

Q1  Answer      {'Part 1': 'Answer part one', 'Part 2': 'Answe...
    Proof       {'Part 1': 'The proof part one', 'Part 2': 'Th...
    Question                                     question wording
Q2  Answer      {'Part 1': 'Answer part one', 'Part 2': 'Answe...
    Proof       {'Part 1': 'The proof part one', 'Part 2': 'Th...
    Question                                     question wording

but I want it to look like this in the end:

Index   Question                Answer          Proof
Q1      question one wording    Answer part 1   Proof part 1
Q1      question one wording    Answer part 2   Proof part 2
Q2      question two wording    Answer part 1   Proof part 1
Q2      question two wording    Answer part 2   Proof part 2

So I need to melt/unstack/pivot/expand/other_manipulation_word the nested dictionary in the Dataframe.

I have looked at this for guidance but can't apply it to my own: Expand pandas dataframe column of dict into dataframe columns

Sandy
  • 201
  • 2
  • 9
  • I'm not sure how to get the exact layout you want, but df.reset_index will achieve the desired effect on the index column. – Lucas Apr 12 '19 at 11:35

1 Answers1

2

Here is one potential solution:

1) Create the initial DataFrame with orient 'index'

df = pd.DataFrame.from_dict(review, orient='index')

2) Create the shape of the final DataFrame using Index.repeat, Series.str.len and DataFrame.loc

df_new = df.loc[df.index.repeat(df.Answer.str.len())]

3) Fix 'Answer' and 'Proof' columns by passing to DataFrame contructor and using stack values

df_new['Answer'] = pd.DataFrame(df.Answer.tolist()).stack().values
df_new['Proof'] = pd.DataFrame(df.Proof.tolist()).stack().values
print(df_new)

            Question           Answer               Proof
Q1  question wording  Answer part one  The proof part one
Q1  question wording    Answer part 2    The proof part 2
Q2  question wording  Answer part one  The proof part one
Q2  question wording    Answer part 2    The proof part 2
Chris Adams
  • 18,389
  • 4
  • 22
  • 39
  • Is to_list() a function from Pandas? I don't see it here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html and I get an error: AttributeError: 'Series' object has no attribute 'to_list' – Sandy Apr 12 '19 at 12:40
  • see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_list.html – Chris Adams Apr 12 '19 at 12:42
  • 1
    you may be using an older version of pandas...? try `tolist()` without the underscore? – Chris Adams Apr 12 '19 at 12:43
  • This works well but I'm surprised there isn't a method without forcing it. I can get the correct format with just one question i.e. Q1, doing: df = pd.DataFrame.from_dict(review['Q1']) print(df.T.unstack()) – Sandy Apr 12 '19 at 12:56