0

I have created a number of different functions to perform analysis of the same data. The data is in a large csv file, and I don't want to open it multiple times to be able to use the data.

I have created a function just to open the data - this function will check if the dataframe is empty, and then open the file - or if it has data, and then just serve that data.

To do this I have created a global variable to hold the data.

so my main will have multiple DATA = pd.DataFrame() and for each file i will open it as this:

def get_data():
    global DATA
    if DATA.empty:
        DATA = pd.open_csv(file.csv)
        return DATA
    else:
        return DATA

The problem using a global variable, is that all my analysis functions will have to be in the same file as the corresponding get_data functions.

As I create more functions I would love to be able to split them up into multiple files.

Nichlas H.
  • 135
  • 2
  • 6

1 Answers1

1

There is no reason why you can't split your methods the way you want. Try this:

# get_data.py

DATA = pd.Dataframe()

def get_data():
    global DATA
    if DATA.empty:
        DATA = pd.open_csv('file.csv')
        return DATA
    else:
        return DATA

__all__ = ['get_data']

And then in your analysis modules:

# analysis_1.py

from get_data import get_data

data = get_data()

analyse(data)

...

If the get_data module is already loaded, importing it multiple times won't re-run the module contents. This way you can maintain global state in the get_data module itself.

rdas
  • 20,604
  • 6
  • 33
  • 46