3

I have a py script, lets call it MergeData.py, where I merge two data files. Since I have a lot of pairs of data files that have to be merged I thought it would be good for readability reasons to put my code in MergeData.py into a function, say merge_data(), and call this function in a loop over all my pairs of data files in a different py script.

2 Questions:

  1. Is it wise, in terms of speed, to call the function from a different file instead of runing the code directly in the loop? (I have thousands of pairs that have to be merged.)

  2. I thought, to use the function in MergeData.py I have to include in the head of my script from MergedData import merge_data. Within the function merge_data I make use of pandas which I import in the main file by 'import pandas as pd'. When calling the function I get the error 'NameError: global name 'pd' is not defined'. I have tried all possible places to import the pandas modul, even within the function, but the error keeps popping up. What am I doing wrong?

In MergeData.py I have

def merge_data(myFile1,myFile2):
       df1 = pd.read_csv(myFile1)
       df2 = pd.read_csv(myFile2)
       # ... my code

and in the other file I have

import pandas as pd
from MergeData import merge_data
# then some code to get my file names followed by
FileList = zip(FileList1,FileList2)

for myFile1,myFile2 in FileList:
    # Run Merging Algorithm
    dataEq = merge_data(myFile1,myFile2)

I am aware of What is the best way to call a Python script from another Python script?, but cannot really see if that relates to me.

Community
  • 1
  • 1
user3820991
  • 2,310
  • 5
  • 23
  • 32

1 Answers1

2

You need to move the line

import pandas as pd

Into the module in which the symbol pd is actually needed, i.e. move it out of your "other file" and into your MergeData.py file.

wim
  • 338,267
  • 99
  • 616
  • 750
  • That's weired. I am sure I tried this one, because it naturally presents itself a possible solution. Probably I forgot to save the modul after including pd. Any take on number 1? – user3820991 Jul 23 '14 at 11:25
  • Regarding number 1, it is perfectly fine to do that and there will be very little difference in performance – wim Jul 23 '14 at 11:32