0

I have a folder that contains pictures of receipts that are named in a specific way. Date first in reverse format (ex. 21/11/2015 -> 15_11_21) followed by a space and then the value of the receipt (ex. 18,45 -> 18_45)

Let's say the files are stored in location C:\pictures\receipts. In this folder I have 3 files:

15_11_21 18_45.jpg
15_11_22 115_28.jpg
15_12_02 3_00.jpg

I want to create an array that has 3 columns. The first column contains the date of the receipt in normal format, the second column contains the value in negative and the third column has the absolute path of the file. The array should be like this:

Receipts = [21/11/2015|-18,45 |C:\pictures\receipts\15_11_21 18_45.jpg
            22/11/2015|-115,28|C:\pictures\receipts\15_11_22 115_28.jpg 
            02/12/2015| -3,00 |C:\pictures\receipts\15_12_02 3_00.jpg];

I tried modifying/combining various functions like getting the full path:

[status, list] = system( 'dir /B /S *.mp3' );
result = textscan( list, '%s', 'delimiter', '\n' );
fileList = result{1}

strsplit to separate the values of the filenames, and even this function, but I cannot get the desired result.

Community
  • 1
  • 1
  • To be honest you'd probably be best served with a [Regular Expression](http://www.mathworks.com/help/matlab/matlab_prog/regular-expressions.html). I generally test with [the Python regex101](https://regex101.com/#python), the syntax is mostly similar to the MATLAB (and I also use Python). Unfortunately I don't have MATLAB available to test. – sco1 Jan 18 '16 at 18:05

3 Answers3

0

It looks like strsplit should do what you want. Try:

strsplit (filename, {' ', '.'})

Also, I would use dir rather than system, since it is probably more independent of changes in the operating system.

Herb
  • 178
  • 1
  • 1
  • 10
  • MATLAB's `dir` does not support searching into subdirectories. The DOS `dir` searches into subdirectories with the `/S` flag, as OP has used it. – sco1 Jan 18 '16 at 18:08
  • Very true. I interpreted the question to mean that all of the files were stored in the same subfolder ( "C:\pictures\receipts"). If I misinterpreted the question, I apologize. – Herb Jan 18 '16 at 18:11
0

A little bit "hacky":

filename = 'C:\pictures\receipts\15_11_21 18_45.jpg';
filename = strsplit(filename,'\');
filename = filename(end);

d = textscan('15_11_21 18_45.jpg', '%d_%d_%d %d_%d.jpg');
day   = d{1};
month = d{2};
year  = d{3};
a     = -d{4};
b     = d{5};

receipt = sprintf('%d/%d/20%d|%d,%d|%s', year, month, day, a, b, filename{1})

Have a look at formatting operators (e.g. type doc sprintf). You may want to add some flags for justification/spacings.

Matthias W.
  • 1,039
  • 1
  • 11
  • 23
0

One option, utilizing regular expressions and a data structure as the final output:

% Get list of JPEGs in the current directory + subdirectories
[~, list] = system( 'dir /B /S *.jpg' );
result = textscan( list, '%s', 'delimiter', '\n' );
fileList = result{1};

% Split out file names, could use a regex but why bother. Using cellfun
% rather than an explicit loop
[~, filenames] = cellfun(@fileparts, fileList, 'UniformOutput', false);

% Used named tokens to pull out our data for analysis
Receipts = regexp(filenames, '(?<date>\d*_\d*_\d*)\s*(?<cost>\d*_\d*)', 'names');
Receipts = [Receipts{:}];  % Dump out our nested data
[Receipts(:).fullpath] = fileList{:};  % Add file path to our structure

% Reformat costs
% Replace underscore with decimal, convert to numeric array and negate
tmp = -str2double(strrep({Receipts(:).cost}, '_', '.')); 
tmp = num2cell(tmp);  % Necessary intermediate step, because MATLAB...
[Receipts(:).cost] = tmp{:};  % Replace field in our data structure
clear tmp

% Reformat dates
formatIn = 'yy_mm_dd';
formatOut = 'dd/mm/yyyy';
pivotYear = 2000;  % Pivot year needed since we have 2-digit years
% datenum needed because we have a custom input date format
tmp = datestr(datenum({Receipts(:).date}, formatIn, pivotYear), formatOut);
tmp = cellstr(tmp);  % Necessary intermediate step, because MATLAB...
[Receipts(:).date] = tmp{:};
clear tmp

This results in a structure array, Receipts. I went this route because it's more explicit to access the data in the future. For example, if I wanted the cost of my 2nd receipt, I could do:

Employee2Cost = Receipts(2).cost;

Which returns:

Employee2Cost =

 -115.2800
sco1
  • 12,154
  • 5
  • 26
  • 48