Does anyone know how to put a stopwatch before and after training each model to evaluate which one is faster?

Question

I have created loan risk prediction python machine learning model for Predict whether borrower will able to pay bank loan or not. My model is working perfectly fine with 78% accuracy. However my professor told me that " Put a stopwatch before and after training each model to evaluate which one is faster, or even better, hits the trade-off between speed and accuracy the best (we want fast and accurate model). ", But i don't know how to add stopwatch in model. I have searched on internet about this thing i didn't get any information about how to put stopwatch in model. Please let me know if anyone know how to put stopwatch before and after training each model

##My Python Prediction model

# Importing the Libraries
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.preprocessing import StandardScaler
import seaborn as sns
sns.set(style="white", color_codes=True)

# Importimg the dataset and displaying first 10 values 
data = pd.read_csv("credit_train.csv")
data.head(10)

# Find null values
data.isnull().sum()

# Drop null records
data = data.dropna(axis=0)

#To get basic information and statistics
data.describe()

# Check number of unique values
data["Home Ownership"].unique()
data["Home Ownership"].value_counts()

# Data Representation
sns.FacetGrid(data,hue="Loan Status",size=4) \
.map(plt.scatter,"Current Loan Amount","Monthly Debt") \
.add_legend()
plt.show()

# Categorical attributes visualization
sns.countplot(x="Loan Status",data=data)
sns.countplot(x="Term",data=data)
sns.countplot(x="Years in current job",data=data)
sns.countplot(x="Home Ownership",data=data)
sns.countplot(x="Loan Status",hue="Home Ownership",data=data)
sns.countplot(x="Loan Status",hue="Term",data=data)

# Numerical attributes visualization
sns.distplot(data['Current Loan Amount'])
sns.distplot(data['Annual Income'])
sns.distplot(data['Credit Score'])
sns.distplot(data['Monthly Debt'])
sns.distplot(data['Current Credit Balance'])

#Normalization and log transformation 
data['Current Loan Amount Log'] = np.log(data['Current Loan Amount']+1)
sns.distplot(data["Current Loan Amount Log"])
data['Credit Score Log'] = np.log(data['Credit Score']+1)
sns.distplot(data["Credit Score Log"])
data['Annual Income Log'] = np.log(data['Annual Income']+1)
sns.distplot(data["Annual Income Log"])
data['Monthly Debt Log'] = np.log(data['Monthly Debt']+1)
sns.distplot(data["Monthly Debt Log"])
data['Current Credit Balance Log'] = np.log(data['Current Credit Balance']+1)
sns.distplot(data["Current Credit Balance Log"])

# Drop unnecessary columns
data = data.drop(['Loan ID', 'Customer ID', "Current Loan Amount", "Credit Score", "Annual Income", 'Years in current job', 'Current Credit Balance', 'Purpose', 'Monthly Debt'], axis=1)

# Correlation Matrix of the columns given below
cols = ['Credit Score Log','Annual Income Log','Monthly Debt Log',
        'Current Credit Balance Log','Current Credit Balance Log','Current Loan Amount Log','Tax Liens','Years of Credit History', 'Maximum Open Credit']
f, ax = plt.subplots(figsize=(15, 10))
cm = np.corrcoef(df.values.T)
sns.set(font_scale=1.5)
hm = sns.heatmap(cm, cbar=True, annot=True, square=True, fmt='.2f', annot_kws={'size': 15}, yticklabels=cols, xticklabels=cols)
plt.show()

# Label Encoding
from sklearn.preprocessing import LabelEncoder
cols = ['Loan Status',"Term","Home Ownership"]
le = LabelEncoder()
for col in cols:
    data[col] = le.fit_transform(data[col])

# data slicing
x = data.drop(columns=['Loan Status'], axis=1)
y = data['Loan Status']

# Train-Test Split
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)

# Random forest model
# Importing libraries and classes
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(x_train,y_train)

# Find accuracy in training and testing model
model.score(x_train,y_train)
model.score(x_test,y_test)

# Predict the value of test dataset
predicted = model.predict(x_test)

# Generating Report
from sklearn import metrics
print(metrics.classification_report(y_test, predicted))

# Confusion Matrix
print(metrics.confusion_matrix(y_test, predicted))

See this answer I gave someone recently about timing their methods. https://stackoverflow.com/a/67006419/12545290 — acrobat, Apr 30 '21 at 13:47
use time module https://docs.python.org/3/library/time.html?highlight=time#module-time — kiranr, Apr 30 '21 at 13:47
@carperyeltsin There is *no* reason for that comment. Bram's comment is autogenerated when someone votes to close your question as a duplicate. It's merely a suggestion that your question has already been answered elsewhere. — chepner, Apr 30 '21 at 14:10

edusanketdk · Accepted Answer · 2021-04-30T13:55:18.760

-1

from time import time

t_bef = time()
function()
t_aft = time()

print("function took", t_aft-t_bef, "seconds")                # stmt1
print("function took", (t_aft-t_bef)*1000, "microseconds")    # stmt2

You can replicate a timer by reading the time right before and after the code in the context, by using time.time() function. Notice that the time calculation might need to be altered as done in stmt2 to get precision.

edited Apr 30 '21 at 13:55

answered Apr 30 '21 at 13:49

edusanketdk

602
1
6
11

This is not an accurate way to time code. Use the `timeit` module. – chepner Apr 30 '21 at 13:56
Yes, it might not preform with accuracy. But it is the most basic it can get. – edusanketdk Apr 30 '21 at 14:00

Does anyone know how to put a stopwatch before and after training each model to evaluate which one is faster?

1 Answers1