1

Here is the problem:

data = 1:0.5:(8E6+0.5);

An array of 16 million points, needs to be averaged every 10,000 elements.

Like this:

x = mean(data(1:10000))

But repeated N times, where N depends on the number of elements we average over

range = 10000;

N = ceil(numel(data)/range);

My current method is this:

data(1) = mean(data(1,1:range));
for i = 2:N
    data(i) = mean(data(1,range*(i-1):range*i));
end

How can the speed be improved?

N.B: We need to overwrite the original array of data (essentially bin the data and average it)

Adriaan
  • 17,741
  • 7
  • 42
  • 75
JCW
  • 43
  • 7
  • Have you tried using this code but with `parfor` not `for`? – Wolfie Sep 19 '17 at 16:23
  • Yeah but that seemed to take on the order of seconds, rather than ms which is what this gives, my CPU is only a duo core. It might have taken so long because it did not setup the parallel pool in advance? – JCW Sep 19 '17 at 16:40
  • @JCW yes; always open the pool before timeing. See [this answer of mine](https://stackoverflow.com/a/32146700/5211833) on speed-up using `parfor`. I wouldn't be surprised that even after pre-opening the pool the vectorised solution is faster though; as all functions are multithreaded and you have only 2 cores. – Adriaan Sep 19 '17 at 16:50

3 Answers3

4
data = 1:0.5:(8E6-0.5); % Your data, actually 16M-2 elements
N = 1e4; % Amount to average over
tmp = mod(numel(data),N); % find out whether it fits
data = [data nan(1,N-tmp)]; % add NaN if necessary
data2=reshape(data,N,[]); % reshape into a matrix
out = nanmean(data2,1); % get average over the rows, ignoring NaN

Visual confirmation that it works using plot(out)

enter image description here

Note that technically you can't do what you want if mod(numel(data),N) is not equal to 0, since then you'd have a remainder. I elected to average over everything in there, although ignoring the remainder is also an option.

If you're sure mod(numel(data),N) is zero every time, you can leave all that out and reshape directly. I'd not recommend using this though, because if your mod is not 0, this will error out on the reshape:

data = 1:0.5:(8E6+0.5); % 16M elements now
N = 1e4; % Amount to average over
out = sum(reshape(data,N,[]),1)./N; % alternative
Adriaan
  • 17,741
  • 7
  • 42
  • 75
  • Thanks, this is faster, on my PC making the time it takes change from 130 ms to 100 ms, but we need it to be even faster, around 15-30 ms, or a factor of ~5. – JCW Sep 19 '17 at 16:07
  • 1
    @JCW I'm rather certain `mean` and `nanmean` are multithreaded, and reshaping matrices is almost free in MATLAB. That means that probably the only way to speed up further is to buy a better computer. – Adriaan Sep 19 '17 at 16:09
  • @JCW do you have an NVIDIA GPU and the distributed computing toolbox? – Ander Biguri Sep 19 '17 at 16:12
  • Is there no way this could be made any faster? (without upgrading hardware) – JCW Sep 19 '17 at 16:15
  • @AnderBiguri No I have Intel HD Graphics, however I do have access to all the MATLAB Add-ons – JCW Sep 19 '17 at 16:17
  • @JCW Using `sum(data2,1,'omitnan')/N` instead of `nanmean(data2,1)` might be a little faster – Luis Mendo Sep 19 '17 at 16:19
  • 1
    @LuisMendo Yes, that decreased it to 89ms ! – JCW Sep 19 '17 at 16:21
  • @JCW I'm thinking that the last chunk should not be divided by `N`. You need to correct the last one – Luis Mendo Sep 19 '17 at 16:22
  • @Adriaan your last edit using `sum(reshape(data,N,[]),1,'omitnan')./N;` decreased the runtime down to 49 ms ! I should note in that most cases my mod will be 0 as these parameters and ratios would be setup in advance – JCW Sep 19 '17 at 16:38
  • @JCW that's all nice and well, but as I said, that's sort-of dangerous. Check the updated version; the `omitnan` can be omitted as well (pun intended) – Adriaan Sep 19 '17 at 16:39
  • @Adriaan The only issue with this solution is that it does not overwrite the original data, which is very important for us. I could add a `clear data` to the end but this itself takes a few extra ms. I also understand that you cannot use reshape as `data = reshape(data,...)` – JCW Sep 19 '17 at 17:02
  • @JCW then you use `data = sum(reshape(data,N,[]),1)./N;`? Assigning output to a variable should've been one of the first things you learned when starting MATLAB. I don't like to overwrite my initial data with averaged data, as I might need to get back to the original at some point. But yes, you can do that and it will probably be faster still, as you save memory + creating a variable – Adriaan Sep 19 '17 at 17:04
  • Ah my mistake, the whole point of this averaging is to save memory/decrease the number of points we're storing due to the large amount of data we're handling – JCW Sep 19 '17 at 17:09
2

This is a bit wasteful, but you can use movmean (which will handle the endpoints the way you want it to) and then subsample the output:

y = movmean(x, [0 9999]);
y = y(1:10000:end);

Even though this is wasteful (you're computing a lot of elements you don't need), it appears to outperform the nanmean approach (at least on my machine).

=====================

There's also the option to just compensate for the extra elements you added:

x = 1:0.5:(8E6-0.5);
K = 1e4;
Npad = ceil(length(x)/K)*K - length(x);
x((end+1):(end+Npad)) = 0;
y = mean(reshape(x, K, []));
y(end) = y(end) * K/(K - Npad);
CKT
  • 781
  • 5
  • 6
0

reshape the data array into a 10000XN matrix, then compute the mean of each column using the mean function.