0

Pandas Coding practice: Is it better to build functions returning a DataFrame or Series?

This is a pretty fundamental question (and apols if already asked) but it would be great to hear views on this. I am leaning towards Series as it appears a more fundamental building block (i.e. index into df receives series), but there are some limits on the functionality that can be applied to Series. Equally, the fundamental argument could be taken one step further to numpy arrays where I begin to lose development speed.

Sam
  • 91
  • 1
  • 7
  • 2
    You've not defined how the function will be used or what limitations you are facing wrt Series vs DataFrame, there are different methods available but this shouldn't be an issue – EdChum Feb 19 '15 at 09:23
  • 1
    This to me feels like asking whether it is better coding practice to write functions that return ints or floats - it really depends on what you are using the function for. – nullstellensatz Feb 19 '15 at 10:37

1 Answers1

0

The most obvious constraint you should consider is memory when performing functions. There are many techniques for estimating memory usage (linked below) including writing your dataframes do .csv files and checking their dbytes(). However if you're managing a small dataset, managing several dataframes shouldn't be an issue.

How to estimate how much memory a Pandas' DataFrame will need?

That said, you can also be structuring multiple functions and looking at their core process time statistics:

What do 'real', 'user' and 'sys' mean in the output of time(1)?

There is really no more detail I can provide without clarity/specificity around the question above.

Community
  • 1
  • 1
unique_beast
  • 1,379
  • 2
  • 11
  • 23