How to average parameters for a grid in utm format avoiding for loops

Question

Hi I have a small problem. I have 1 sec time resolution gps data in utm (x,y) with speed for one year and I would like to make speed averages over a 20m grid. My code works but it is really slow as i use for loops to find the coordinates which matches the grid. Any help is appreciated. kind regards matthias

   %x_d is x coordinate
   %Y_d is y coordinate
   %x_vec is the xgrid vector definition
   %y_vec is the ygrid vector definition
   %s is the speed

   for i=1:length(vec_x)
     for j=1:length(vec_y)
       ind = find(x_d<=vec_x(i)+10& x_d>vec_x(i)-10 & y_d<=vec_y(j)+10 &  y_d>vec_y(j)-10);

     Ad(j,i) = nanmean(s(ind));
   end
end

How are your grids defined? Is it one 20m grid or are you breaking all of your data into 20m grids? — sco1, Sep 17 '15 at 14:16
Hi here is my grid definition vec_y=6638960:20:6649860; vec_x=590000:20:606000; — Mat Ginny Vogginio, Sep 17 '15 at 14:17
IIUC, it seems `ind` would be a scalar or empty array, so what's the significance of `nanmean` and how can you index `s` with it? — Divakar, Sep 17 '15 at 14:27
x_d,y_d, and s are the same size. ind finds the points within the grid and nanmean averages the found values in s for this grid cell. — Mat Ginny Vogginio, Sep 17 '15 at 14:31
Doesn't `x_d<=vec_x(i)+10& x_d>vec_x(i)-10 & y_d<=vec_y(j)+10 & y_d>vec_y(j)-10` produce a scalar at each iteration? — Divakar, Sep 17 '15 at 14:32
@Divakar the formula produces logical indices. Because of the operator precedence, `+` or `-` are evaluated before `<=` operators. — JaBe, Sep 17 '15 at 15:29

score 1 · Accepted Answer · edited May 23 '17 at 11:51

It can be done with no loops using MATLAB's histcounts and accumarray functions. This question/solution is a near duplicate of this question/solution, except for possibly the use of histcounts.

histcounts is good for binning problems (which this is). [~,~,x_idx]=histcounts(x_d,x_vec) tells you which x-bin each x-coordinate is in. Similarly for y_d, y_vec.

accumarray is good for summing with repeated indices (to avoid looping). The call below sums the speed values for each bin, and then applies the @mean function to average them. The 0 tells accumarray to fill empty bins with zeros.

x_vec = 0:20;
y_vec = 0:20;
x_d   = rand(1000,1)*20;
y_d   = rand(1000,1)*20;
s     = rand(1000,1);

[~,~,x_idx] = histcounts(x_d,x_vec);
[~,~,y_idx] = histcounts(y_d,y_vec);

avg = accumarray([x_idx y_idx],s,[length(x_vec)-1,length(y_vec)-1],@mean,0)

This is brilliant. Super fast! Exactly what I needed and great with the explanation. Thanks a lot Matthias — Mat Ginny Vogginio, Sep 18 '15 at 09:05

score 0 · Answer 2 · answered Sep 17 '15 at 15:56

A) Use logical indexing, which gets rid of the time consuming find: Your code

ind = find(x_d<=vec_x(i)+10& x_d>vec_x(i)-10 & y_d<=vec_y(j)+10 &  y_d>vec_y(j)-10);

Ad(j,i) = nanmean(s(ind));

is equal to the faster

ix = x_d<=vec_x(i)+10& x_d>vec_x(i)-10 & y_d<=vec_y(j)+10 &  y_d>vec_y(j)-10;

Ad(j,i) = nanmean(s(ix));

B) Try to use your known Grid-size information to access the right grid-element directly. From the offset and element-size (20) you can infer:

ind_x = floor((x_d - min(vec_x))./20) + 1;
ind_y = floor((y_d - min(vec_y))./20) + 1;

Then you would loop throught your grid and pick the positions. Maybe this could be improved by arrayfun.

for i=1:length(vec_x)
 for j=1:length(vec_y)
  ix = ind_x == i & ind_y == j;
  Ad(j,i) = nanmean(s(ix));
 end
end

Thanks, this works perfectly, however it is pretty slow still for 1 month of 1 sec data it takes around 40 min on my computer. The code from Geoff below does the job in 1 min or so. ind_x = floor((x_d - min(vec_x))./20) + 1; ind_y = floor((y_d - min(vec_y))./20) + 1; was very usefull as it does pretty much the same as [~,~,x_idx] = histcounts(x_d,x_vec); [~,~,y_idx] = histcounts(y_d,y_vec); from the code below.Thanks alot! — Mat Ginny Vogginio, Sep 18 '15 at 08:54

How to average parameters for a grid in utm format avoiding for loops

2 Answers2