5

Is there a built-in function in Matlab that condenses sequence of half integers to expressions with colon operators?

For example, [1:4,5:.5:7] gives

1, 2, 3, 4, 5, 5.5, 6, 6.5, 7

Given a double array such as [1, 2, 3, 4, 5, 5.5, 6, 6.5, 7], is there a convenient way to convert it back to [1:4,5:.5:7] — or equally valid, [1:5,5.5:.5:7] — as a string?

Argyll
  • 8,591
  • 4
  • 25
  • 46
  • 2
    I'd be surprised if this is built in, but you could build such a function. It's not easy though, and the solution is ambiguous: `[1:4,5:.5:7]` and `[1:5,5.5:.5:7]` are the same! – Cris Luengo Oct 03 '19 at 21:30
  • @Argyll Would you be happy with any of those possible solutions? Or what's the criterion to choose one – Luis Mendo Oct 03 '19 at 21:34
  • To add to Cris Luengo's comments: some of the complications are (1) floating-point inaccuracies in detecting runs with the same step (try for example `format long, diff([7.1 7.2 7.3 7.4])`), and (2) avoiding to output `[1 3 5 7 9 10 11 12]` as `'[1:2:7 9 10:12]'` (I assume that both `'[1:2:7 9:12]'` and `'[1:2:9 10:12]'` would be valid) – Luis Mendo Oct 03 '19 at 22:00
  • @LuisMendo: I would be happy with either solution. The desired output is just that it is condensed. The numbers themselves are always half integers. So without generalizing the problem, floating-point inaccuracy wouldn't hurt in my application. – Argyll Oct 03 '19 at 22:22
  • A similar question in Python (but doesn't take into account floating-point accuracy issues): https://stackoverflow.com/questions/43788106/ – Aziz Oct 03 '19 at 22:26
  • @Aziz: Thanks for the link. In other words, scan from left to right and group until step changes. Would you recommend that in Matlab too? – Argyll Oct 03 '19 at 22:38
  • 1
    I didn't notice the "half integer" spec. That solves the floating point accuracy issue – Luis Mendo Oct 03 '19 at 22:47

2 Answers2

4

Here is a solution that

  • Adequately handles numbers that don't form a range. For example, [1 3 5 9] will be output as '[1:2:5 9]'. Similarly, [1 3 5 9 11] would give '[1:2:5 9 11]'.
  • Omits specifying step 1. For example, [9 3 4 5] would give [9 3:5].
  • Omits unnecessary brackets. For example, [8 6 4 2] would give '8:-2:2', and 5 would give '5'.
  • Allows empty input. So [] would give '[]'.

x = [2 4.5 7 9.5 9 8 7 6 5 15 7.5 7 6.5 6 9 11]; % example input
sep = ' '; % define separator; it could also be comma
str = ''; % initiallize output
k = 1; % first number not processed yet
while k<=numel(x)
    m = find(diff([diff(x(k:end)) inf]), 1) + 1; % may be empty
    if m>2 % if non-empty and at least 2: range found (at least 3 numbers)
        ini = x(k);
        ste = x(k+1)-x(k);
        fin = x(k+m-1);
        if ste~=1
            str = [str num2str(ini) ':' num2str(ste) ':' num2str(fin)];
        else
            str = [str num2str(ini) ':' num2str(fin)];
        end
        k = k+m; % m numbers have been processed
    else % no range: include just one number
        str = [str num2str(x(k))];
        k = k+1; % 1 number has been processed
    end
    str = [str sep]; % add separator
end
str = strip(str,sep); % this removes trailing space/comma, if it exists. For pre-2016b, use `strtrim`
if any(str==sep) || isempty(str)
    str = ['[' str ']']; % brackets are required
end

Examples / tests:

  • [2 4.5 7 9.5 9 8 7 6 5 15 7.5 7 6.5 6 9 11] gives '[2:2.5:9.5 9:-1:5 15 7.5:-0.5:6 9 11]'
  • [1.5 16 -0.5 -7 -9 -11] gives '[1.5 16 -0.5 -7:-2:-11]'
  • [4 2 0 -2 5 12 19] gives '[4:-2:-2 5:7:19]'
  • [-2 0 2.5 5.5] gives '[-2 0 2.5 5.5]'
  • [2 3 4 10 7 8] gives '[2:4 10 7 8]'
  • [6 4.5 3] gives '6:-1.5:3'
  • [3 4 5 6] gives '3:6'
  • 42 gives '42'
  • [] gives '[]'

The code consists of a loop that searches for the maximum-length range starting at the current position, and then mover forward. The trickiest part is the line

m = find(diff([diff(x(k:end)) inf]), 1) + 1; % may be empty

This tries to find the maximum length m of numbers that form a range, starting from the current position k. diff(x(k:end)) computes consecutive differences, and the outer diff detects changes in those difference. The first such change, computed with find(..., 1), indicates the first number that doesn't belong to the range. There are five cases, the second of which explains why the inf is needed:

  • Proper range of 3 or more numbers, followed by at least a number not in that range. For example, if x(k:end) is [3 5 7 15] we have diff(x(k:end)) equal to [2 2 8], diff([diff(x(k:end)) inf]) equal to [0 6 inf], find(..., 1) gives 2, and m is 3.
  • Proper range of 3 or more numbers, which ends the array. For example, if x(k:end) is [3 5 7] we have diff(x(k:end)) equal to [2 2], diff([diff(x(k:end)) inf]) equal to [0 inf], find(..., 1) gives 2, and m is 3. This is why inf is needed; without it the result would be m=[], which would be incorrectly interpreted as "no range" by the if branch.
  • No proper range; there are more than 2 numbers remaining. For example, if x(k:end) is [3 6 7] we have diff(x(k:end)) equal to [3 1], diff([diff(x(k:end)) inf]) equal to [-2 inf], find(..., 1) gives 1, and m is 2. This means that there is a range of two numbers; but it is not a proper range, so the if branch will disregard it, and execution will proceed with the else part.
  • No proper range; there are only 2 numbers remaining. For example, if x(k:end) is [3 6] we have diff(x(k:end)) equal to 3, diff([diff(x(k:end)) inf]) equal to [inf], find(..., 1) gives 1, and m is 2. Again, there is a range of two numbers, but it is not a proper range. Note that without the included inf we would have m=[] instead of the "correct" 2, but that would also be valid to trigger the else part (in which the actual value of m is not used).
  • No proper range; there is only 1 number remaining, that is, the current number ends the array. For example, if x(k:end) is just 3 we have diff(x(k:end)) equal to [], diff([diff(x(k:end)) inf]) equal to [], find(..., 1) gives [], and m is []. Although m "should" be 1, [] is just as valid to trigger the else part.

Note that, since the input contains integers or half integers, there are no floating-point accuracy issues, as those numbers are represented exactly up to ±2^52.

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
0

here's a simple loopy answer, given x is your vector, I chose 1e-10 to be the threshold of difference for floating point accuracy...

x=[1, 2, 3, 4, 5, 5.5, 6, 6.5, 7];
dx=diff(x);
a=num2str(x(1)); % first step, start the range
for n=2:numel(dx)
    if abs(dx(n)-dx(n-1))>1e-10
        if dx(n-1)~=1
        a=[a ':' num2str(dx(n-1)) ':' num2str(x(n)), ' ', num2str(x(n+1)) ];
        else
       a=[a ':' num2str(x(n)), ' ', num2str(x(n+1)) ];
        end     
    end
end
a=[a ':' num2str(dx(end)) ':' num2str(x(end))]; % last step, close the range


> a =

    '1:5 5.5:0.5:7'
bla
  • 25,846
  • 10
  • 70
  • 101