8

how to create a dictionary of pandas dataframes, and return the dataframes into excel worksheets?

Hi All,

I am learning pandas and python, and I want to create a dictionary which contains a some dataframes, which I can then run metrics over each dataframe. With each unique cluster name (one of the columns) I would like to create a dataframe (subset of original dataframe.

Then I would like to be able to select it, run metrics over it, putting the results in a new dataframe, and then place the original dataframe (each subset) into a separate worksheet using xlsxwriter python library.

#create dictionary object

    c_dict = {}

#get a list of the unique names

c_dict= data.groupby('Cluster').groups

#create a dictionary of dataframes, one for each cluster

for cluster in c_dict.items():
    df = data[data['Cluster']==cluster
    c_dict[cluster] =df                                                        <<< im getting invalid syntax here

#go through the dictionary and create a worksheet and put the dataframe in it.

for k,v in c_dict.items():
    dataframe = GetDF(k)                                                            <<< creating worksheets and puts the data from the dataframe > worksheet is not working because of invalid syntax when trying to create dataframe dictionary ^^
    dataframe.to_excel(writer,sheet_name=k)
writer.save

#get the dataframe from the dictionary,

GetDF(dictionary_key)
          return c_dict[dictionary_key]
yoshiserry
  • 20,175
  • 35
  • 77
  • 104
  • There are a lot of problems with your code. You assign c_dist={}, then immediately assign c_dict to a groupby object. You also don't call the save method on writer, you just reference the function. You also check for the equality of c_dict[cluster] to df, it's not clear why that's happening. – munk Feb 25 '14 at 02:49
  • I have a large dataset. I want to create smaller datasets (dataframes) based on the unique values of the cluster column. In that code I wanted to create the smaller datasets(dataframes) and assign them to a dictionary so I could call them later. How do you call rather than reference the writer object?> – yoshiserry Feb 25 '14 at 02:55
  • To call a function, you need parenthesis after it with the appropriate argument list. You're not calling the object, you're calling a method on the object. I'm guessing you meant writer.save(). You're probably better off just using the groupby object, where you can reference a group, rather than trying to stick it into a dict. – munk Feb 25 '14 at 03:00
  • I'd like to learn how to stick it into the dict, but thanks for the details around calling functions. I'll read the doco – yoshiserry Feb 25 '14 at 03:19
  • so does anyone actually know how to format my dictionary so that I can store dataframes as I have described? – yoshiserry Feb 25 '14 at 09:06

1 Answers1

6

I think this is what you're looking for. As I said in the comments, it's probably not the right solution and it's definitely not idomatic for pandas DataFrames.

import pandas as pd

groups = data.groupby('Cluster')

#create a dictionary of dataframes, one for each cluster
c_dict = {k: pd.DataFrame(v) for k, v in groups.groups.iteritems() }

If you want to save this to an excel file, the documentation is here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

There is a nice example at the bottom that will do what you need. Hint: use for k,v in myDict.iteritems() to get keys and values.

munk
  • 12,340
  • 8
  • 51
  • 71
  • Thanks I was under the impression I should use dict.items? What is iter items as a-posed to that? And pandas idiomatic? – yoshiserry Feb 25 '14 at 19:25
  • It actually depends on whether you're using python 2 or 3. There is an explanation [here](http://stackoverflow.com/questions/10458437/python-what-is-the-difference-between-dict-items-and-dict-iteritems). iteritems() doesn't make a copy, which can be important if you have large data sets. For programming idioms see [this wikipedia article](http://en.wikipedia.org/wiki/Programming_idiom) and the [Pandas Cookbook](http://pandas.pydata.org/pandas-docs/stable/cookbook.html) – munk Feb 25 '14 at 19:33
  • @yoshiserry did you solve this? Could you post your solution I need to do something simlar. – ctrl-alt-delete Feb 24 '16 at 09:32
  • I got an error in your code with `iteritems()`. So I replaced `groups.groups.iteritems()` with just `groups` and it worked for me – Vibhu Jun 29 '18 at 06:30
  • for python3 you need to use items instead of iteritems – Daniel Sobrado Oct 29 '20 at 11:08