How to average independent consecutive blocks of an array as fast as possible?

Question

Here is the problem:

data = 1:0.5:(8E6+0.5);

An array of 16 million points, needs to be averaged every 10,000 elements.

Like this:

x = mean(data(1:10000))

But repeated N times, where N depends on the number of elements we average over

range = 10000;

N = ceil(numel(data)/range);

My current method is this:

data(1) = mean(data(1,1:range));
for i = 2:N
    data(i) = mean(data(1,range*(i-1):range*i));
end

How can the speed be improved?

N.B: We need to overwrite the original array of data (essentially bin the data and average it)

Yeah but that seemed to take on the order of seconds, rather than ms which is what this gives, my CPU is only a duo core. It might have taken so long because it did not setup the parallel pool in advance? — JCW, Sep 19 '17 at 16:40
@JCW yes; always open the pool before timeing. See [this answer of mine](https://stackoverflow.com/a/32146700/5211833) on speed-up using `parfor`. I wouldn't be surprised that even after pre-opening the pool the vectorised solution is faster though; as all functions are multithreaded and you have only 2 cores. — Adriaan, Sep 19 '17 at 16:50

Adriaan · Accepted Answer · 2017-09-19T16:39:22.273

4

data = 1:0.5:(8E6-0.5); % Your data, actually 16M-2 elements
N = 1e4; % Amount to average over
tmp = mod(numel(data),N); % find out whether it fits
data = [data nan(1,N-tmp)]; % add NaN if necessary
data2=reshape(data,N,[]); % reshape into a matrix
out = nanmean(data2,1); % get average over the rows, ignoring NaN

Visual confirmation that it works using plot(out)

Note that technically you can't do what you want if mod(numel(data),N) is not equal to 0, since then you'd have a remainder. I elected to average over everything in there, although ignoring the remainder is also an option.

If you're sure mod(numel(data),N) is zero every time, you can leave all that out and reshape directly. I'd not recommend using this though, because if your mod is not 0, this will error out on the reshape:

data = 1:0.5:(8E6+0.5); % 16M elements now
N = 1e4; % Amount to average over
out = sum(reshape(data,N,[]),1)./N; % alternative

edited Sep 19 '17 at 16:39

answered Sep 19 '17 at 16:01

Adriaan

17,741
7
42
75

Thanks, this is faster, on my PC making the time it takes change from 130 ms to 100 ms, but we need it to be even faster, around 15-30 ms, or a factor of ~5. – JCW Sep 19 '17 at 16:07
1

@JCW I'm rather certain `mean` and `nanmean` are multithreaded, and reshaping matrices is almost free in MATLAB. That means that probably the only way to speed up further is to buy a better computer. – Adriaan Sep 19 '17 at 16:09
@JCW do you have an NVIDIA GPU and the distributed computing toolbox? – Ander Biguri Sep 19 '17 at 16:12
Is there no way this could be made any faster? (without upgrading hardware) – JCW Sep 19 '17 at 16:15
@AnderBiguri No I have Intel HD Graphics, however I do have access to all the MATLAB Add-ons – JCW Sep 19 '17 at 16:17
@JCW Using `sum(data2,1,'omitnan')/N` instead of `nanmean(data2,1)` might be a little faster – Luis Mendo Sep 19 '17 at 16:19
1

@LuisMendo Yes, that decreased it to 89ms ! – JCW Sep 19 '17 at 16:21
@JCW I'm thinking that the last chunk should not be divided by `N`. You need to correct the last one – Luis Mendo Sep 19 '17 at 16:22
@Adriaan your last edit using `sum(reshape(data,N,[]),1,'omitnan')./N;` decreased the runtime down to 49 ms ! I should note in that most cases my mod will be 0 as these parameters and ratios would be setup in advance – JCW Sep 19 '17 at 16:38
@JCW that's all nice and well, but as I said, that's sort-of dangerous. Check the updated version; the `omitnan` can be omitted as well (pun intended) – Adriaan Sep 19 '17 at 16:39
@Adriaan The only issue with this solution is that it does not overwrite the original data, which is very important for us. I could add a `clear data` to the end but this itself takes a few extra ms. I also understand that you cannot use reshape as `data = reshape(data,...)` – JCW Sep 19 '17 at 17:02
@JCW then you use `data = sum(reshape(data,N,[]),1)./N;`? Assigning output to a variable should've been one of the first things you learned when starting MATLAB. I don't like to overwrite my initial data with averaged data, as I might need to get back to the original at some point. But yes, you can do that and it will probably be faster still, as you save memory + creating a variable – Adriaan Sep 19 '17 at 17:04
Ah my mistake, the whole point of this averaging is to save memory/decrease the number of points we're storing due to the large amount of data we're handling – JCW Sep 19 '17 at 17:09

CKT · Answer 2 · 2017-09-19T16:26:36.023

This is a bit wasteful, but you can use movmean (which will handle the endpoints the way you want it to) and then subsample the output:

y = movmean(x, [0 9999]);
y = y(1:10000:end);

Even though this is wasteful (you're computing a lot of elements you don't need), it appears to outperform the nanmean approach (at least on my machine).

=====================

There's also the option to just compensate for the extra elements you added:

x = 1:0.5:(8E6-0.5);
K = 1e4;
Npad = ceil(length(x)/K)*K - length(x);
x((end+1):(end+Npad)) = 0;
y = mean(reshape(x, K, []));
y(end) = y(end) * K/(K - Npad);

score 0 · Answer 3 · answered Sep 19 '17 at 15:55

0

reshape the data array into a 10000XN matrix, then compute the mean of each column using the mean function.

answered Sep 19 '17 at 15:55

Bojan Gavrilovic

26
2

Doesn't work; `numel(data) = 15,999,998`, so reshaping can't happen – Adriaan Sep 19 '17 at 15:56
2

Appending zeros changes the average – Adriaan Sep 19 '17 at 15:58
Only for the last column, just calculate that one separately without the zeros – Bojan Gavrilovic Sep 19 '17 at 16:09
1

That's a little more involved than you'd think on first glance (see my answer). Without writing any code, this looks a lot like a comment to me. – Adriaan Sep 19 '17 at 16:10

How to average independent consecutive blocks of an array as fast as possible?

3 Answers3