I have a py script, lets call it MergeData.py, where I merge two data files. Since I have a lot of pairs of data files that have to be merged I thought it would be good for readability reasons to put my code in MergeData.py into a function, say merge_data(), and call this function in a loop over all my pairs of data files in a different py script.
2 Questions:
Is it wise, in terms of speed, to call the function from a different file instead of runing the code directly in the loop? (I have thousands of pairs that have to be merged.)
I thought, to use the function in MergeData.py I have to include in the head of my script from MergedData import merge_data. Within the function merge_data I make use of pandas which I import in the main file by 'import pandas as pd'. When calling the function I get the error 'NameError: global name 'pd' is not defined'. I have tried all possible places to import the pandas modul, even within the function, but the error keeps popping up. What am I doing wrong?
In MergeData.py I have
def merge_data(myFile1,myFile2):
df1 = pd.read_csv(myFile1)
df2 = pd.read_csv(myFile2)
# ... my code
and in the other file I have
import pandas as pd
from MergeData import merge_data
# then some code to get my file names followed by
FileList = zip(FileList1,FileList2)
for myFile1,myFile2 in FileList:
# Run Merging Algorithm
dataEq = merge_data(myFile1,myFile2)
I am aware of What is the best way to call a Python script from another Python script?, but cannot really see if that relates to me.