0

I have a set of .txt files with name: table.iterations.txt where iterations = 1:10000 (so it is table.01.txt, table.02.txt, table.1001.txt etc, each file size is lower than 2kb). Each txt file contains values, integers without decimals in different lines p.e.:

table.01.txt  table.02.txt ... table.1001.txt
 2              5               32
 5             19               37
19             45               58
52             88               62 
62             89               75
95                              80
99                              88
                               100   

Each txt file can contain different number of values, where 0<value<101.

I need help on how could read all those files to find the percentage of occurrence of its value inside all txt files. On the above rough example, value 2 is present one time, value 5 two times, value 100 one time etc.

Thank you in advance.

Daniel Daranas
  • 22,454
  • 9
  • 63
  • 116
professor
  • 247
  • 3
  • 12
  • Take a shot at doing this for one file and post your code? Then have a look at `dir(*.txt)` http://www.mathworks.com/help/matlab/ref/dir.html for reading all the files. I suggest you make an array called occurences that is 102 elements long and then everytime you encounter a numner `n` you just `occurences(n+1) = occurences(n+1) + 1` – Dan Apr 25 '13 at 11:41
  • 3
    Once all numbers are loaded into a variable, `[histc(X, unique(X)), unique(X)]` will give you a histogram of the occurrences. Converting to a percentage should be easy from there – user1207217 Apr 25 '13 at 11:41
  • 1
    From [process a list of files with a specific extension name in matlab](http://stackoverflow.com/a/7293443/2180721) – Oleg Apr 25 '13 at 11:45
  • 1
    @user1207217 don't you mean `histc(X, 0:101)`? – Dan Apr 25 '13 at 11:49
  • @Dan I suppose I do, in this case (p.s 1:100, he used strict inequalities) – user1207217 Apr 25 '13 at 11:52
  • Thank you guys. Your advices were spot on. I ll add the solution to my question, for future references. – professor Apr 25 '13 at 12:39
  • @professor please post the solution as a solution rather than adding it to the question and then accept your own posted solution – Dan Apr 25 '13 at 13:31
  • 1
    @Dan: i've done it, adding latest updates to my solution. – professor Apr 25 '13 at 16:08

1 Answers1

1

From comments, according to this post:

dirName = 'C:\yourpath';                           %# folder path
files = dir( fullfile(dirName,'table.*.txt') );    %# list all *.txt files, make sure you have only the txt's you are interested on inside selected path
files = {files.name}';                             %# file names
data = cell(numel(files),1);                       %# store file contents
for i=1:numel(files)  
    fname = fullfile(dirName,files{i});            %# full path to file
    values{i}=load(fname);                         %# load values from txt to variable
    data{i} = histc(values{i},1:100);              %# find occurences, for max value =25 change 100 to 25
end

thestructdata=[data{:}];                           %# convert to matrix
for j2=1:size(thestructdata,1)
    occ(j2,:)=histc(thestructdata(j2,:),1);        %# find the number of occurence, 1 is present, on each line on all txt files 
end
occ=[occ]';                                        %# gather results to an array
occperce=occ(1,:)./numel(files)*100                %# results in percentage, max value = 25, change to 100 if needed as the OP question

Results (for 25 value's max value):

occ =    
    14    11    10    12    13    15    11    10    11    10     7    14    11    12    11    13     7    11    10    12    14    12    13    14    11


occperce =

  Columns 1 through 20

   56.0000   44.0000   40.0000   48.0000   52.0000   60.0000   44.0000   40.0000   44.0000   40.0000   28.0000   56.0000   44.0000   48.0000   44.0000   52.0000   28.0000   44.0000   40.0000   48.0000

  Columns 21 through 25

   56.0000   48.0000   52.0000   56.0000   44.0000

If you like, you may delete all txt files doing this: delete(dirName,'table.*.txt');

Community
  • 1
  • 1
professor
  • 247
  • 3
  • 12