11

I would like to create a class from a pandas dataframe that is created from csv. Is the best way to do it, by using a @staticmethod? so that I do not have to read in dataframe separately for each object

user308827
  • 21,227
  • 87
  • 254
  • 417

2 Answers2

19

You don't need a @staticmethod for this. You can pass the pandas DataFrame whenever you're creating instances of the class:

class MyClass:

    def __init__(self, my_dataframe):
        self.my_dataframe = my_dataframe

a = MyClass(my_dataframe)
b = MyClass(my_dataframe)

At this point, both a and b have access to the DataFrame that you've passed and you don't have to read the DataFrame each time. You can read the data from the CSV file once, create the DataFrame and construct as many instances of your class as you like (which all have access to the DataFrame).

Simeon Visser
  • 118,920
  • 18
  • 185
  • 180
  • thanks! so if I add a column to the dataframe via object a, and also add column with same name but different values, via object b; will it modify original dataframe? – user308827 Nov 15 '14 at 20:21
  • @user308827: eh, yes, as there's only one DataFrame object in your program at this point. Though you have to be aware that both classes modify the same DataFrame so you can't assume a DataFrame will remain unchanged. Depending on what your code does, each instance will need to use the same DataFrame throughout. – Simeon Visser Nov 15 '14 at 20:23
  • thanks for the clear explanation. I want a subset of the dataframe for each object. is there any way I can make sure that all objects I create have separate subsets and only modify those subsets? – user308827 Nov 15 '14 at 20:27
  • You could read the DataFrame and then create a few smaller DataFrames from that big DataFrame. If you then pass the smaller DataFrames to the classes then there's no risk of modifying wrong columns/data. So you'd have to select the columns you want from a DataFrame and create a new DataFrame accordingly. – Simeon Visser Nov 15 '14 at 20:30
4

I would think you could create the dataframe in the first instance with

a = MyClass(my_dataframe)

and then just make a copy

b = a.copy(deep=True)

Then b is independent of a

Roelant
  • 4,508
  • 1
  • 32
  • 62
Tom
  • 1,003
  • 2
  • 13
  • 25