There are some simple codes (like data clustering etc.) that I use again and again, and every time either I search online or look at my previous codes.
Then I thought of creating functions for the codes and writing them to a module. So, I created an myfunctions.py file with content like:
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
def kmeans(data,elbowplot=False,maxcluster=10,
nclusters,scatter2=False,dotsize=50,colormap='viridis'):
data = pd.DataFrame(data)
if elbowplot == True:
sum_of_squares = []
K = range(1,maxcluster)
for k in K:
kmeansAlgo = KMeans(n_clusters=k).fit(data)
sum_of_squares.append(kmeansAlgo.inertia_)
plt.plot(K, sum_of_squares, 'bo-')
plt.xlabel('number of clusters')
plt.ylabel('Sum of squared distances')
plt.title('Elbow Method For Optimal k')
plt.show()
kmeansAlgo = KMeans(n_clusters=nclusters)
kmeansAlgo.fit(data)
kmeansgroups = kmeansAlgo.predict(data)
if scatter2 == True:
x = data.iloc[:,0]
y = data.iloc[:,1]
plt.scatter(x, y, c=kmeansgroups, s=dotsize, cmap=colormap)
return kmeansgroups
Then, in another file, I simply type
import myfunctions
and start using my functions. However, I suspect that there is something inefficient in this approach. For example, I have to import the modules (like pandas) again in my code.
So, my question is, why do I have to import the modules again? And is using this approach inefficient?