0

Let's assume that we have a table with two columns. The table contains data and our goal is to sort that table.

Assume our data looks like this, where y1 and y2 are the data in the columns.

enter image description here

You can produce that plot with MATLAB or GNU Octave.

% Simulate the model
[t,y] = ode45(@odefunc,[0 20],[1; -2]);

% Plot the simulation
close all
plot(t,y(:,1),'-r',t,y(:,2),'-b')
title('Solution of van der Pol Equation (\mu = 1) with ODE45');
xlabel('Time t');
ylabel('Solution y');
legend('y_1','y_2')
grid on

function dydt = odefunc(t,y)
  dydt = [y(2); (1-0.1*y(1)^2)*y(2)-y(1) + 1];
end

If we look above the plot, we are going to se the data like this:

enter image description here

You can create that plot with this code:

% Plot 3D bar
figure
imagesc(y)
colorbar

Here we can see that the plot have a very much like a "table-look". My question is what algorithm is used when sorting the rows in the table so every row that looks almost the same, have it's own unique position in the table.

For example, if we have a table like this.

0 2 4
1 3 5
2 4 6
3 5 7
4 6 8
5 7 9
0 2 4
1 3 5
2 4 6
3 5 7
4 6 8
5 7 9
0 2 4
1 3 5
2 4 6
3 5 7
4 6 8
5 7 9
0 2 4
1 3 5

The code if you want to create that table.

j = 0;
rows = 20;
for i = 1:rows
  disp(sprintf("%i %i %i", j, j+2, j+4))
  j = j + 1;
  if(j + 4 >= 10)
    j = 0;
  end
end

We can see that there are four rows of 0 2 4 and three rows of 5 7 9. I want all rows 0 2 4 close to each other and all rows 5 7 9 close to each other. And.... 0 2 4 cannot be after 5 7 9 because then the plot would look terrible.

For example, assume that we begining with row 1, the first row 0 2 4. Then we are looking for the same rows of 0 2 4 and let's say we found four rows 0 2 4. Then we sort them.

0 2 4
0 2 4
0 2 4
0 2 4

Now next row would be 1 3 5 and we find two rows of 1 3 5. We sorting them.

0 2 4
0 2 4
0 2 4
0 2 4
1 3 5
1 3 5

After we have sorted for a while, we are going to have a table like this.

0 2 4
0 2 4
0 2 4
0 2 4
1 3 5
1 3 5
2 4 6
2 4 6
2 4 6
2 4 6
3 5 7
3 5 7
3 5 7
.
.
.
.
5 7 9
5 7 9
5 7 9

And now, we found 1 2 4, which is very similar to 0 2 4. So we need to place 1 2 4 close to 0 2 4, perhaps between 0 2 4 or 1 3 5 or after 0 2 4 or before 0 2 4. How do I even know that 1 2 4 should be placed close to 0 2 4? That's the issue!!!.

How can I sort that?

I need to do that in C-programming language because speed is most important here, but I think I will start to do it in GNU Octave. I'm pretty sure that there is a SQL-sorting algorithm I'm looking for.

Notice in practice, there are numbers, integers, 10-bit e.g values between 0-1023.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
euraad
  • 2,467
  • 5
  • 30
  • 51
  • 1
    You must define "similarity." Once you have defined it, you can write a function for it and use for example `qsort` to do the sorting. `qsort` calls your function and the return value tells `qsort` whether to exchange the items. – Paul Ogilvie May 28 '21 at 13:08
  • Any data can be presented in tabular form. Your "looks like a table" observation is not wrong, but also not much relevant. To the extent that there are sorting algorithms preferred by DBMS systems, those decisions are driven by the data structures used by the DB implementation to store the data, not by the structure of the data itself. – John Bollinger May 28 '21 at 13:09
  • @PaulOgilvie Well, I could say "+-2" for each value. For example find all `0 2 4` where each index in `0 2 4` can have a tolerance of `+-2`. Is `qsort` the right function for me? – euraad May 28 '21 at 13:10
  • Not similarity, @PaulOgilvie, but rather relative order. A similarity measure is an altogether different thing, and grouping by similarity is quite a different thing from sorting. – John Bollinger May 28 '21 at 13:11
  • 1
    @DanielMårtensson, I think you have hit the nail on the head with the question "How do I even know that `1 2 4` should be placed close to `0 2 4`?", but that's exactly where we can't help you. *You* need to decide what it means for tuples to be similar to each other. There are multiple possibilities, and sorting or grouping based on different choices will produce different results. – John Bollinger May 28 '21 at 13:21
  • @JohnBollinger Thank you for your reply. I think the solution is that I need a tolerance for each row. I think I will go with this: if I pick `0 2 4`, then I will look for other rows who are `-2 0 2`, `-1 1 3`, `0 2 4`, `1 3 5`. They would I place together as one "class". – euraad May 28 '21 at 13:28
  • @DanielMårtensson, that's not a criterion you can sort on. To perform sorting, you need to be able to determine, consistently, for every pair of tuples that can appear in your data, whether tuple1 comes before or after tuple2, or whether the two are equivalent with respect to ordering. – John Bollinger May 28 '21 at 13:31
  • For example, one possibility would be to measure similarity of vector length or (easier) vector length squared. `1 2 4` has squared length `21` == `1*1 + 2*2 + 4*4`, and `0 2 4` has squared length 20, whereas `1 3 5` has squared length 35. You could then sort your tuples by (squared) length. – John Bollinger May 28 '21 at 13:31
  • You could also use a simple sum of the elements, or a function of a subset of the elements, or a variety of other alternatives. Different choices will result in different orders, and thus different choices of similarity groups. I mentioned the vector length first because I think it's a fairly good candidate: it will tend to position tuples with small differences at several positions closer together than tuples with a large difference at just one position. – John Bollinger May 28 '21 at 13:36
  • Wouldn't it be interesting to consider each triplet as a point in a 3D world? And the compute the distance between each point? if one point is X1, Y1, Z2 and the second point is X2, Y2, Z2 then the distance is Sqrt((X2-X1)*(X2-X1) + (Y2-Y1)(Y2-Y1) + (Z2-Z1)*(Z2-Z1)). Now for sorting all the points you need a reference. For example the point (0,0,0) and then you can sort the list of point according to their distance from that reference point. You could also take as reference the point for which the sum of distance between it and all other is minimal. – fpiette May 28 '21 at 13:46
  • This question may be worth a look: [Heuristics to sort array of 2D/3D points according their mutual distance](https://stackoverflow.com/q/37271413/5264491). The accepted answer is to use space-filling curves to reduce to 1 dimension preserving locality of data. – Ian Abbott May 28 '21 at 13:48
  • “I need to do that in C-programming language because speed is most important here, but I think I will start to do it in GNU Octave.” And then you tag with MATLAB? Why? Please use the tag system appropriately. If you want the answer to work in Octave, tag with Octave. If you want the answer to work in C, tag with C. Currently your question is ambiguous because you could be receiving answers in different languages. – Cris Luengo May 28 '21 at 14:14
  • At this stage, the problem is more a math issue than programming. – fpiette May 28 '21 at 15:06
  • @fpiette no it's not. – euraad May 28 '21 at 16:28

0 Answers0