For now, I am just running PCA and KNN on 400 rgb images of watches to find the most similar ones among them. I want to know how much memory I am using at each part of my program. For this reason I followed this link and my source code is the following:
import cv2
import numpy as np
import os
from glob import glob
from sklearn.decomposition import PCA
from sklearn import neighbors
from sklearn import preprocessing
import os
import psutil
def memory_usage():
process = psutil.Process(os.getpid())
print(round(process.memory_info().rss / (10 ** 9), 3), 'GB')
data = []
# Read images from file
for filename in glob('Watches/*.jpg'):
img = cv2.imread(filename)
height, width = img.shape[:2]
img = np.array(img)
# Check that all my images are of the same resolution
if height == 529 and width == 940:
# Reshape each image so that it is stored in one line
img = np.concatenate(img, axis=0)
img = np.concatenate(img, axis=0)
data.append(img)
memory_usage()
# Normalise data
data = np.array(data)
Norm = preprocessing.Normalizer()
Norm.fit(data)
data = Norm.transform(data)
memory_usage()
# PCA model
pca = PCA(0.95)
pca.fit(data)
data = pca.transform(data)
memory_usage()
# K-Nearest neighbours
knn = neighbors.NearestNeighbors(n_neighbors=4, algorithm='ball_tree', metric='minkowski').fit(data)
distances, indices = knn.kneighbors(data)
print(indices)
memory_usage()
The output is the following:
0.334 GB # after loading images
1.712 GB # after data normalisation
1.5 GB # after pca
1.503 GB # after knn
What is the meaning of these outputs?
Do they represent the memory used at this point and is this a direct indicator of the memory required by the objects and functions of the program until this point (or things are more complicated)?
For example, why memory usage is higher after data normalisation than it is after PCA?