I have a folder with 38 files. The names are like this: AWA_s1_features.mat, AWA_s2_features.mat......AWA_s38_features.mat Each file is an array with 28 columns but with different # of rows. For example: AWA_s1_features.mat = (139,28), AWA_s2_features.mat = (199, 28) and so on.
As I am doing machine learning I need to join all these files in 1 huge array and label each row. So for the 139 rows of AWA_s1_features.mat there must be 139 1s; for AWA_s2_features.mat there must be 199 2s, and so on until AWA_s38_features.mat which must have a # of 38s.
I wrote some code. But I have found that the files are not called in order and therefore the labeling is wrong. For example, AWA_s1_features.mat is not the first file to be called and it has been labeled as 11. AWA_s2_features.mat has been labeled as 21.
So how can I improve my code so that it calls each file in the correct sequence?
Here is the code:
import numpy as np
import scipy.io as sio
import glob
read_files = glob.glob('I:/2D/Features 2D/AWA_s*.mat')
x = np.array([])
y = np.array([])
q = 1
for f in read_files:
l=sio.loadmat(f)['features']
x = np.concatenate((x, l), axis=0) if x.size else l
y_temp = q*np.ones((l.shape[0],1))
y = np.concatenate((y, y_temp), axis=0) if y.size else y_temp
q = q + 1
sio.savemat('AWA_FeaturesAll.mat', {'x':x, 'y':y})