0

I want to specify my train and test set explicitly in the terminal. Instead of specifying them in the code while running the .ipynb file in the terminal. As of now this is what I am doing.

# FOR TRAINING DATA

# LISTING OUT ALL FILES PRESENT IN FOLDER PATH
path = "C:/Users/****/****/Latest_Datasets/base_out"
files = os.listdir(path)
df = pd.DataFrame()

# APPENDING THE ALL DATA FROM THE FOLDER PATH TO DATAFRAME
for f in files:
    data = pd.read_csv(f, 'Sheet1',delimiter='\t',usecols=['details','amount','category'],encoding=("utf-8"))
    df = df.append(data)
df.reset_index(level=0, inplace=True)
df['index1'] = df.index
df=df[['index1','amount','details','category']]

# FOR TEST DATA

test_data=pd.read_csv('testfile.csv',
 delimiter='\t',usecols=['xn_details','xn_amount','category'],encoding='utf-8')


x_train, y_train = (df.details, df.category )
x_test, y_test = (test_data.details, test_data.category)

# After this I apply my model and get my classifications for my test.details

I want to give the training data and test data as a parameter in the terminal instead of specifying in the script. How do I do this. Thanks in advance

Poornesh V
  • 171
  • 1
  • 10

1 Answers1

0

You can import the sys module and then user sys.argv to pass arguments in command line.

import sys
#everything else remains the same
.
.
.
 test_data=pd.read_csv(sys.argv[1],
 delimiter='\t',usecols=['xn_details','xn_amount','category'],encoding='utf-8')

sys.argv[0] #the first argument stores the python file name such as "test.py"
sys.argv[1] #this will store the csv file that you want to pass as an argument to pd.read_csv(). You need to pass this as a command line argument.

So, in the command line you should execute the following line:

C:\>python test.py testfile.csv  #test.py is the name of your python file *.py
amanb
  • 5,276
  • 3
  • 19
  • 38