0

I have a long list of pandas transformation commands that I need to run against a pandas DataFrame:

pd['newvar_A'] = pd['somevar'] * pd['somevar']
pd['newvar_C'] = pd['somevar'] * pd['somevar']
pd['newvar_D'] = pd['somevar'] * pd['somevar']
pd['newvar_ETC'] = pd['somevar'] * pd['somevar']

It's a long list (about 150 lines). Is it possible to include this as a separate script called transformations.py in an already existing script? The idea is to keep the main script simple, so my idea is the script to look like this:

import pandas as pd
pd.read_csv ('data.csv')
...
#Run transformations
insert file = "transformations.py"
...
#rest of the main script

Is there a Python command to call another Python script (assuming this script is located in the same folder as the working directory)?

Thanks!

Jake Shaffer
  • 35
  • 1
  • 8
EGM8686
  • 1,492
  • 1
  • 11
  • 22

1 Answers1

1

You can try to "import" the script as it's the best way as per this post

A small example

sample.csv

name,age
sharon,12
shalom,10

The script which I am going to import
nameChange.py

import pandas as pd

# transform the csv file
data = pd.read_csv('sample.csv')
data.iloc[0,0] = 'justin'
data.to_csv('sample.csv',index = False)

The main code
stackoverflow.py

import pandas as pd

# before transform
data = pd.read_csv('sample.csv')
print(data)

# call the script
import nameChange

# do the work after the script runs
transformed_data = pd.read_csv('sample.csv')
print(transformed_data)

Output

  name  age
0 sharon 12
1 shalom 10
  name  age
0 justin 12
1 shalom 10

To run the above code without modifying the original csv

The script which I am going to import
nameChange.py

import pandas as pd
import pickle

# transform the csv file variable which was saved by stackoverflow.py
data = pickle.load(open('data.sav','rb'))
data.iloc[0,0] = 'justin'
# saving the df
pickle.dump(data,open('data.sav','wb'))

The main code
stackoverflow.py

import pandas as pd
import pickle

# before transform
data = pd.read_csv('sample.csv')
print(data)
pickle.dump(data,open('data.sav','wb'))

# call the script
import nameChange
transformed_data = pickle.load(open('data.sav','rb'))

# do the work after the script runs
print(transformed_data)
Star Rider
  • 389
  • 2
  • 15
  • But this works since in the nameChange.py you read the data again and do all types of transformation and then save as csv and open in the main module. Is it possible to read the df in one py file and then on the second py file invoque it and run transformations. My guess is that the way it works is that each "module" can only "see" the variables that were created on it's own sintax? – EGM8686 Jan 22 '19 at 22:46
  • You can't call the variable/anything from other program by simply using import because once the program runs and finishes that particular variable isn't existing in the main programs' context. If you still want do the above stuff without modifying the original csv file then save the df using pickle and then load the that df into the main program once you run the transform. – Star Rider Jan 23 '19 at 03:09