2

I have made a script which contains a for loop selecting columns from 533 different excel files and places them into matrices so that they can be compared, however the process is taking too long (it ran for 3 hours yesterday and wasn't even halfway through!!).

I know xlsread is naturally slow, but does anyone know how I can make my script run faster? The script is below, thanks!!

%Split the data into g's and h's
CRNum = 533; %Number of Carrington Rotation files
A(:,1) = xlsread('CR1643.xlsx','A:A'); % Set harmonic coefficient columns
A(:,2) = xlsread('CR1643.xlsx','B:B');
B(:,1) = xlsread('CR1643.xlsx','A:A');
B(:,2) = xlsread('CR1643.xlsx','B:B');

for k = 1:CRNum
    textFileName = ['CR' num2str(k+1642) '.xlsx'];
A(:,k+2) = xlsread(textFileName,'C:C'); %for g
B(:,k+2) = xlsread(textFileName,'D:D'); %for h
end
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
R Thompson
  • 353
  • 3
  • 15
  • 2
    Maybe you can combine the many files into a single one and then just read from different sheets inside it. In that case you would only need to "open the file for reading" once. See [xlsread1](http://www.mathworks.com/matlabcentral/fileexchange/22365-function-for-faster-data-transfer-matlab-%3C-%3E-excel) for an explanation why it works better. – Dev-iL Jun 16 '16 at 10:17
  • How fast does it really need to be? If you just let it run yesterday it would probably have finished by now. (you may want to print the progress as you go, if you choose to do it like this.) – Dennis Jaheruddin Jun 16 '16 at 10:18
  • I suppose it doesn't have to be amazingly fast, however yesterday I was running it on my laptop which began to overheat excessively (it's a terrible laptop) and so I decided to run it on the university campus computers today which are faster!! How do I print the progress as I go? – R Thompson Jun 16 '16 at 10:22
  • I will also take a look at xlsread1 thank you!! – R Thompson Jun 16 '16 at 10:22
  • The easiest way to print progress here: in the loop, just remove the semicolum after `textFileName = ['CR' num2str(k+1642) '.xlsx']`, alternately you can just print `k`. – Dennis Jaheruddin Jun 16 '16 at 10:25
  • `xlsread1` is a good idea. You don't need to combine the files. Simply using this function will save you all the time required for opening and closing the ActiveX server (you will only do it once, as specified in the documentation for `xlsread1`). – buzjwa Jun 16 '16 at 11:28
  • To show progress there's [`waitbar`](http://www.mathworks.com/help/matlab/ref/waitbar.html), and many alternatives in FEX ([example](http://www.mathworks.com/matlabcentral/fileexchange/6922-progressbar)). – Dev-iL Jun 16 '16 at 11:30
  • @Naveh are you sure `xlsread1` works like that? I thought `xlsread1` only gives an improvement when you need to read many times from the same file. If you have lots of files it seems like creating a ActiveX/COM object for each is inevitable. It would be good to know I'm wrong about this... – Dev-iL Jun 16 '16 at 11:34
  • I haven't used it, but using a single ActiveX server is just what I wanted to suggest. Then I saw your comment and realized this is already possible through `xlsread1`. This *does* require using `Open` and `Close` for each file (see the example in the `xlsread1` description), but a significant amount of time can be saved just because an Excel instance is only opened once. A further speed increase can be from only reading each file once like you and Dennis suggested. – buzjwa Jun 16 '16 at 11:47

3 Answers3

2

Don't use xlsread if you want to go through a loop. because it opens excel and then closes excel server each time you call it, which is time consuming. instead before the loop use actxserver to open excel, do what you want and finally close actxserver after your loop. For a good example of using actxserver, search for "Read Spreadsheet Data Using Excel as Automation Server" in MATLAB help.

And also take a look at readtable which works faster than xlsread, but generates a table instead.

eulerleibniz
  • 271
  • 3
  • 11
  • find a simple example with the basic commands in MATLAB Help: http://de.mathworks.com/help/matlab/matlab_external/using-a-matlab-application-as-an-automation-client.html – bushmills Jun 16 '16 at 12:27
  • and if you are running MATLAB on a linux machine you have to use the Apache POI library for writing data into an excel workbook. The following function on the file exchange gives you an overview about the workflow: https://www.mathworks.com/matlabcentral/fileexchange/38591-xlwrite--generate-xls-x--files-without-excel-on-mac-linux-win – bushmills Jun 16 '16 at 12:30
  • Can't try it now, but this gave me the the idea that using `num = xlsread(filename,sheet,xlRange,'basic')` might also impact performance as it won't call Excel. – Dennis Jaheruddin Jun 17 '16 at 07:26
1

The most obvious improvement seems to be to load the files only partially if possible. However, if that is not an option, try whether it helps to only open each file once (read everything you need, and then assign it).

M(:,k+2) = xlsread(textFileName,'C:D');

Also check how much you are reading in each time, if you read in many rows in the first file, you may make the first dimension of A big, and then you will fill it each time you read a file?

As an extra: a small but simple improvment can be found at the start. Don't use 4 load statements, but use 1 and then assign variables based on the result.

Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
  • So you mean something like this? ` for k = 1:CRNum` ` textFileName = ['CR' num2str(k+1642) '.xlsx']; ` ` C = xlsread(textFileName,'C:D'); ` ` A(:,k+2) = C(:,1); ` ` B(:,k+2) = C(:,2); ` `end` There are 55 rows being read in each time adding columns to A – R Thompson Jun 16 '16 at 10:27
  • As you mentioned going to a university, this job can easily be parallelized if you can access the files on multiple computers. First computer builds the matrix for `k=1:100`, next for `k=101:200` etc – Dennis Jaheruddin Jun 16 '16 at 10:27
  • @RThompson That was indeed what I had in mind. Just try it for a few files and see whether it helps. If its still slow, profile your code and see where the time is going. If it is still mostly calls to `xlsread`, I don't think you can do much else about the logic here. Then you can try whether `xlsread1` helps. -- Bit late now, but if you ran it for 3 hours yesterday, you could have just saved the results and picked it up for another 3 hours today! – Dennis Jaheruddin Jun 16 '16 at 10:32
  • Doing what you mentioned above and printing the process seems to now take 6 seconds for each k and so I believe this should be done in no time, thank you!! – R Thompson Jun 16 '16 at 10:35
  • Well about 53 minutes in total but still considerably quicker!! – R Thompson Jun 16 '16 at 10:37
  • If you does not need to chew a lot of data I think that the time you spending on this takes far too long. If possible I suggest you use a more efficient storage (if you need to do this many times with different data). In case that is not possible I believe there are not much else you can do than waiting. If you have any possibility to influence how the data is written, try to contact the responsible person and explain your problem. – patrik Jun 17 '16 at 06:53
0

As mentioned in this post, the easiest thing to change would be to set 'Basic' to true. This disables things like formulas and macros in Excel and allows you to read a simple table more quickly. For example, you can use:

xlsread('CR1643.xlsx','A:A', 'Basic', true)

This resulted in a decrease in load time from about 22 seconds to about 1 second for me when I tested it on a 11,000 by 7 Excel sheet.

neurd
  • 3
  • 2