Dataframe and pandas

Question

I'm working with a dictionary that has multiple dictionaries stored. It looks like this(this one is pretty shortened).

dict = {('random_number_1', 'random_number_2'): {'random_string_1': 'random_number_3'}, ('random_number_1', 'random_number_4'): {'random_string_2': 'random_number_5'}, ('random_number_1', 'random_number_6'): {'random_string_3': 'random_number_7', 'random_string_4': 'random_number_8', 'random_string_5': 'random_number_10'}}

{('random_number_11','random_number_13'): {'random_string_6': 'random_number_14', 'random_string_7': 'random_number_15'}}

I wish to write excel file that would look like. Example

I used code:

newdict = {(k1, k2):v2 for k1,v1 in dict.items() \
                   for k2,v2 in dict[k1].items()}


df = pd.DataFrame([newdict[i] for i in sorted(newdict)],
              index=pd.MultiIndex.from_tuples([i for i in sorted(newdict.keys())]))

and tried to make dataframe with it, it looks satisfying, but when I wan't to write it to excel(with df.to_excel() I get an error : "Unsupported type 'tuple' in write()". I think that problem lies in my dataframe. When I print it out it looks like.

                                 0
(random_number_1, random_number_2) random_string_1 random_number_3

(random_number_1, random_number_4) random_string_2 random_number_5

(random_number_1, random_number_6) random_string_3 random_number_7

                                   random_string_4 random_number_8

                                   random_string_5 random_number_9    
                                  0
(random_number_10, random_number_11) random_string_6 random_number_12

                                    random_string_7 random_umber_13

Those 0 are at the end of the row. Ps: I'm sorry about not posting my code properly. Didn't know how to make indents without making a code. Hope I didn't cause someone else too much work with editing. Thank you in advance.

score 0 · Accepted Answer · answered Jun 27 '18 at 13:52

0

Not sure it is the best way, but at least from where you are at with the definition of df, you need to reset a level of index and then change the index from tuples to multiindex. With your newdict then do:

df = (pd.DataFrame( [newdict[i] for i in sorted(newdict)],
                    index=pd.MultiIndex.from_tuples([i for i in sorted(newdict.keys())]))
        .reset_index(level=1) # to make random_string as a column not a level of index
        .rename(columns={'level_1':'string',0:'number'})) #this is cosmetic. not necessary
df.index = pd.MultiIndex.from_tuples(df.index) # create the multiindex from current tuples in index

df is like

                                            string            number
random_number_1  random_number_2   random_string_1   random_number_3
                 random_number_4   random_string_2   random_number_5
                 random_number_6   random_string_3   random_number_7
                 random_number_6   random_string_4   random_number_8
                 random_number_6   random_string_5  random_number_10
random_number_11 random_number_13  random_string_6  random_number_14
                 random_number_13  random_string_7  random_number_15

which give you something close to what you want when doing df.to_excel

answered Jun 27 '18 at 13:52

Ben.T

29,160
6
32
54

Thank you for your answer! I tried your suggestion it works! But it only prints one random value. It would only print the random_number_1 value rows. Any idea why? Thank you in advance! – Noob Programmer Jun 27 '18 at 14:05
@NoobProgrammer I did `df.to_excel('myEx.xlsx')` myself and myEx contains all values displayed above. Not sure what happens for you. – Ben.T Jun 27 '18 at 14:10
@NoobProgrammer and the "0 are at the end of the row" is the automatic name of the column with `random_number_3, etc. ` as you don't define a name for columns when using `pd.DataFrame` :) – Ben.T Jun 27 '18 at 14:14
I recreated what my print output is in word. I think that multiple bold points cause my problems, cause they "break" the dataframe? is it possible? Since it only writes one of the whole data. Picture for easier visualisation : https://imgur.com/a/s3utBsF – Noob Programmer Jun 27 '18 at 14:30
@NoobProgrammer if it's really a `print`, the result in your excel should not be like this (it might just be a representation way of your print function), but to be fair I have never seen something like "break" the dataframe. sorry, not sure I can help you further if in your excel is like this :( – Ben.T Jun 27 '18 at 14:50
Yeah I think program makes a different dataset for every first row. That's why i have multiple "string" and "number" rows, even though l would like to have only top "string" and "number" box. And excel writes only one "string" and "number" object. If that makes sense. Thank you so much, you've been great help :) – Noob Programmer Jun 28 '18 at 09:13
It stil hasn't solved my problem, since not whole dataframe is written in excel(only one set of data in "string" and "number" indicies, but you solved my problem with columns and organizing :) thank you! – Noob Programmer Jun 28 '18 at 14:06
@NoobProgrammer sorry I misunderstood what you said then (and now I understand), do you get the same problem with `df.to_csv`? – Ben.T Jun 28 '18 at 14:14
1

Its ok, It's hard to explain:) yes df.to_csv gives me even worse output. It's only two columns and two rows. Altogether combined. But again only last set of data: from last "number" and "string" down. – Noob Programmer Jun 28 '18 at 14:24
Hey again thank you for you time, I have another question, how would I rename first two columns, no matter with .rename it always changes only string/number columns. Thanks! – Noob Programmer Jul 03 '18 at 07:08
@NoobProgrammer the first two columns you mean the indexes? – Ben.T Jul 03 '18 at 13:18
Yes and How did you manage to merge fiirst two columns I am trying with df.set_index but its not working. Made a seperate question here. https://stackoverflow.com/questions/51172234/merging-values-in-dataframe-to-write-in-excel – Noob Programmer Jul 05 '18 at 06:33
@NoobProgrammer to name the indexes, you can do it when creating the multiindex by adding the paramater `names` such as: `df.index = pd.MultiIndex.from_tuples(df.index,names=['name1','name2'])` – Ben.T Jul 05 '18 at 12:51

Dataframe and pandas

1 Answers1