5

I hope this doesn't sound as an open question for discussion. I am going to give some details for my specific case.

I am new to Pandas and I need to store several 2D arrays, where columns represent frequencies and rows represent directions (2D waves spectra, if you are curious). Each array represent a specific time.

I am storing these arrays as Pandas DataFrames, but for keeping them in a single object I thought of 2 options:

  1. Storing the DataFrames in a dictionary where the key is the time stamp.

  2. Storing the DataFrames in a Pandas Panel where the item is the time stamp.

The first option seems simple and has the flexibility to store arrays with different sizes, indexes and column names. The second option seems better for processing the data, since Panels have specific methods, and can also be easily saved or exported (e.g. to csv or pickle).

Which of the two options is better suited in terms of: speed, memory use, flexibility and data analysis?

Regards

jcdoming
  • 351
  • 3
  • 10
  • pandas dataframe. I don't see how this doesn't have the flexibility you described for option 1. Perhaps a simple repeatable example would illuminate things? – kilojoules Feb 25 '16 at 20:37
  • Sorry, I'm editing my question. I didn't explain I already stored the data in Data Frames. My question is what is the best option for keeping the DataFrames together in a single object. – jcdoming Feb 25 '16 at 20:41
  • 1
    Have you considered nesting data frames? – kilojoules Feb 25 '16 at 20:43
  • No, I didn't know that was an option. Would that be better than a Panel though? Maybe more flexible, for DataFrames of different sizes. – jcdoming Feb 25 '16 at 20:46
  • Trying with an example would illuminate that. – kilojoules Feb 25 '16 at 20:47
  • I will and let you know. Thanks. – jcdoming Feb 25 '16 at 20:52
  • 2
    Cool. Also check out this question about dataframes with columns of different lengths: https://stackoverflow.com/questions/19736080/creating-dataframe-from-a-dictionary-where-entries-have-different-lengths – kilojoules Feb 25 '16 at 20:52

1 Answers1

1

I don't think you need a panel. I recommend a nested dataframe approach.

kilojoules
  • 9,768
  • 18
  • 77
  • 149
  • Sorry. Could you tell me how to do this? I need to append each dataframe into the parent dataframe inside a loop and it doesn't seem to work. Maybe a sample code could help. This is how I am doing it so far using dictionaries: `E[dates] = pd.DataFrame(Aux,index=f,columns=dirs)`. Where *E* is a dictionary, *dates* is a float and *Aux*, *f* and *dirs* a list. – jcdoming Feb 25 '16 at 21:52
  • A simple and repeatable example would make this tons easier to answer. What error is produced? Basically, you want to but `NaN`s where there are no valid entries. – kilojoules Feb 25 '16 at 21:59