0

I have a folder with 38 files. The names are like this: AWA_s1_features.mat, AWA_s2_features.mat......AWA_s38_features.mat Each file is an array with 28 columns but with different # of rows. For example: AWA_s1_features.mat = (139,28), AWA_s2_features.mat = (199, 28) and so on.

As I am doing machine learning I need to join all these files in 1 huge array and label each row. So for the 139 rows of AWA_s1_features.mat there must be 139 1s; for AWA_s2_features.mat there must be 199 2s, and so on until AWA_s38_features.mat which must have a # of 38s.

This is what I mean: enter image description here

I wrote some code. But I have found that the files are not called in order and therefore the labeling is wrong. For example, AWA_s1_features.mat is not the first file to be called and it has been labeled as 11. AWA_s2_features.mat has been labeled as 21.

So how can I improve my code so that it calls each file in the correct sequence?

Here is the code:

    import numpy as np
    import scipy.io as sio
    import glob

    read_files = glob.glob('I:/2D/Features 2D/AWA_s*.mat') 
    x = np.array([])
    y = np.array([])
    q = 1
    for f in read_files:     
        l=sio.loadmat(f)['features']
        x = np.concatenate((x, l), axis=0) if x.size else l 
        y_temp = q*np.ones((l.shape[0],1))
        y = np.concatenate((y, y_temp), axis=0) if y.size else y_temp
        q = q + 1
    sio.savemat('AWA_FeaturesAll.mat', {'x':x, 'y':y})
Aizzaac
  • 3,146
  • 8
  • 29
  • 61
  • 1
    This might be of some help, it will allow you to sort files before opening them. Pay particular attention to the 'numericalSort' option. http://stackoverflow.com/questions/12093940/reading-files-in-a-particular-order-in-python – bonafidegeek Sep 22 '16 at 19:00
  • 1
    Have you printed the list `read_files` to see what order `grob` produces? I'd also suggest making `x` and `y` plain lists, append each file to them, and then concatenate just once at the end. – hpaulj Sep 23 '16 at 03:56

1 Answers1

2

The problem is that the default sorting is alphabetical, meaning that "11" comes before "2". You want numerical sorting and one way would be to use the sorted function with a key parameter, like so:

import numpy as np
import scipy.io as sio
import glob

read_files = glob.glob('I:/2D/Features 2D/AWA_s*.mat') 
x = np.array([])
y = np.array([])
q = 1
for f in sorted(read_files, key=lambda f: int(f.split('_')[1][1:])):     
    l=sio.loadmat(f)['features']
    x = np.concatenate((x, l), axis=0) if x.size else l 
    y_temp = q*np.ones((l.shape[0],1))
    y = np.concatenate((y, y_temp), axis=0) if y.size else y_temp
    q = q + 1
sio.savemat('AWA_FeaturesAll.mat', {'x':x, 'y':y})
Matt W
  • 126
  • 1
  • 7
  • Thanks. What is the "lambda" for? – Aizzaac Sep 27 '16 at 13:53
  • 1
    Short answer: lambda is basically an anonymous function. For more information check out [Lambda, filter, reduce and map](http://www.python-course.eu/python3_lambda.php) – Matt W Sep 27 '16 at 22:15