Find and indexing repeating elemnts in an array using Matlab

Question

I have an array A = [10 20 20 30 40 10 50];

Is there any smart way to find and locate the repeating elements of this array ?

i.e.

10: [1 6]; 
20: [2 3];

I tried to use unique but I failed ...

*how* did you use `unique`? – Rody Oldenhuis Jun 06 '14 at 07:36 — Rody Oldenhuis, Jun 06 '14 at 07:36

Amro · Answer 1 · 2014-06-06T08:15:03.910

2

Here is one solution:

% input array
A = [10 20 20 30 40 10 50];

% find unique elements (vals), and map values of A to indices (valsIdx)
[vals,~,valsIdx] = unique(A);

% group element locations by the above indices mapping
locs = accumarray(valsIdx, 1:numel(valsIdx), [], @(x){sort(x)});

% keep only values that are repeated
idx = cellfun(@numel, locs) > 1;
vals = vals(idx);
locs = locs(idx);

The result:

vals =
    10    20
locs = 
    [2x1 double]
    [2x1 double]

>> celldisp(locs)
locs{1} =
     1
     6
locs{2} =
     2
     3

edited Jun 06 '14 at 08:15

answered Jun 06 '14 at 07:39

Amro

123,847
25
243
454

Your solution works fine. What i dont understand is: How is the order of elements in `locs` defined? i inserted another `20` between `30 40` and the result for `locs{2}` is: `3 5 2` indices are right but it isn't sorted in any way i understand. Normally i would suspect it to be either sorted or inverse sorted. Mind explaining your code a bit`? – The Minion Jun 06 '14 at 08:07
@TheMinion: done, I added some comments and fixed the locations to be always sorted. *(Side note: the order that comes out of `accumarray` is somewhat peculiar, read this post if you wanna find out more about it: http://stackoverflow.com/a/17537523/97160)* – Amro Jun 06 '14 at 08:17

Rody Oldenhuis · Answer 2 · 2014-06-06T08:27:40.070

1

Here is another:

>> A = [10 20 20 30 40 10 50];
>> S = sort(A);
>> S = arrayfun(@(x) [x find(x==A)], unique(S(diff(S)==0)), 'UniformOutput', false);
>> S{:}
ans =
    10     1     6
ans =
    20     2     3

If you don't have or want to use arrayfun, you can use a plain loop:

A = [10 20 20 20 30 40 10 50];

S = sort(A);
S = unique(S(diff(S)==0));

R = cell(size(S'));
for ii = 1:numel(S)
    R{ii} = [S(ii) find(A==S(ii))]; end

edited Jun 06 '14 at 08:27

answered Jun 06 '14 at 07:40

Rody Oldenhuis

37,726
7
50
96

i am using an old version of Matlab. arrayfun is not supported – user3270686 Jun 06 '14 at 08:03
1

@user3270686: then use a plain loop :) – Rody Oldenhuis Jun 06 '14 at 08:04
@user3270686: ...you'd be surprised. – Rody Oldenhuis Jun 06 '14 at 08:16
@user3270686: Read for example, [Eric Raymond’s "Rule of Simplicity"](http://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond.E2.80.99s_17_Unix_Rules). – Rody Oldenhuis Jun 06 '14 at 08:22
@user3270686: how old is your version? `arrayfun` goes back to MATLAB 7 I think (from 10 years ago)! – Amro Jun 06 '14 at 08:23
@Amro: ... I suspect s/he just wanted a "smarter" solution than `arrayfun` :) – Rody Oldenhuis Jun 06 '14 at 08:28
I did a bit of runtime testing. I used `A= round(rand(N,1)*M);`. Here i chose N 5000 and 50000. And played around with different values of M. Changing M results in more or less repeated values. For high M (few repeated values) @RodyOldenhuis's code was about one magnitude (factor 10) faster than @Amro's code. BUT for low M(many repeated values) Amro's code was 2 magnitudes faster. So both codes work and depending on the number of repeating values both codes have advantages regarding runtime. My vode works 2 (except he only returns the values not the according indices) and is allways between – The Minion Jun 06 '14 at 08:40
the other 2 codes. :D For real high N values `>500k`my code runs fastest followed by @Amro and @RodyOldenhuis. – The Minion Jun 06 '14 at 08:41
@TheMinion: Thank you for that. Do note that when loops are involved, runtimes depend strongly on MATLAB version. If the OP really has a MATLAB version as old as s/he suggests, loops should be avoided and Amro's solution will likely win in both cases (although I'm not quite sure if and how `accumarray` was implemented in that old a MATLAB...) – Rody Oldenhuis Jun 06 '14 at 08:43
I didn't test the loop version but your first. Didn't even see that you posted the loop-solution as well :D – The Minion Jun 06 '14 at 08:44
sorry but the answer is an empty matrix. Is this a Matlab version problem ? R = [1x3 double] [1x4 double] – user3270686 Jun 06 '14 at 08:47
@user3270686: ...yes? – Rody Oldenhuis Jun 06 '14 at 08:48
1

@TheMinion: I think the most important point is that even for `N` quite large, the computation times of all methods are in the order of tenths of seconds or less. Unless this functionality is needed inside the kernel of some super advanced number crunching algorithm that's being used to make say, accurate climate predictions 100 years in the future (so big data + very frequently called; in which case I'd advise a whole different language anyway), I'd say it's "fast enough", and we're really discussing micro-optimization here :) – Rody Oldenhuis Jun 06 '14 at 08:54
yes ... R{1}= 10 1 7 but i don't know how to handle R. Is it a cell array ? R(1) gives what i wrote before – user3270686 Jun 06 '14 at 08:57
@user3270686: `R{1}(1)` is the value of the repeated element (in this case, 10) and `R{1}(2:end)` are the indices into `A` of where it's located. This is what you asked for right? – Rody Oldenhuis Jun 06 '14 at 08:59
yes, i understood that. R{1}= 10 1 7 ,ok. But how can i store the elements of R{1} as variables ? i.e. x=1 y=7 etc – user3270686 Jun 06 '14 at 09:05
@user3270686: you want to assign the *indices* to variable names? – Rody Oldenhuis Jun 06 '14 at 09:07
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/55180/discussion-between-user3270686-and-rody-oldenhuis). – user3270686 Jun 06 '14 at 09:26

Luis Mendo · Answer 3 · 2014-06-10T15:32:27.583

1

With bsxfun and arrayfun:

comp = tril(bsxfun(@eq, A(:), A(:).')); %'// compare all pairs of values
ind = find(sum(comp)>1); %// find repeated values
values = A(ind);
positions = arrayfun(@(n) find(comp(:,n).'.*(1:numel(A))), ind, 'uni', 0);

This gives:

>> values
values =
    10    20

>> positions{:}
ans =
     1     6

ans =
     2     3

edited Jun 10 '14 at 15:32

answered Jun 06 '14 at 08:55

Luis Mendo

110,752
13
76
147

The Minion · Answer 4 · 2014-06-06T08:41:41.197

0

This solution only returns the values not the indices(location) of the values.

%Your Data
A=[10 20 20 30 40 10 50]; 
%sorted Data
A_sorted=sort(A);
%find the duplicates
idx=find(diff(A_sorted)==0);
% the unique is needed when there are more than two duplicates.
result=unique(A_sorted(idx));

edited Jun 06 '14 at 08:41

answered Jun 06 '14 at 07:41

The Minion

1,164
7
16

it detects the repeating values but i need also the indices of the initial array (i.e. A(1) and A(6), A(2) and A(3) – user3270686 Jun 06 '14 at 07:56
the other solutions do so. I could implement "their" solution into mine but that isn't really useful is it? – The Minion Jun 06 '14 at 08:02

Find and indexing repeating elemnts in an array using Matlab

4 Answers4