2

I want to find the first element that has appeared in previous positions in a vector.

For example, if the vector is:

v = [1, 3, 2, 3, 4, 5];

the answer is v(4) = 3, since 3 is the first element that has been seen twice.
Is there a way to vectorize this operation?

Update:
Here's my current solution, do you have better suggestions?

[s o] = sort(v);  % sort the array
d = diff(s);      % the first zero corresponds to the first repetitive element  
d = find(d == 0);  

o(d(1) + 1) is the index of the first element that has been seen twice.

New Update:
Following @mwengler's solution, I now come up the solution to find the first repeated element of each row of a MATRIX.

function vdup = firstDup(M)
    [SM Ord] = sort(M, 2);    % sort by row
    [rows cols] = find(~diff(SM, 1, 2));   % diff each row, and find indices of the repeated elements in sorted rows
    Mask = (size(M,2) + 1) * ones(size(M)); % create a Mask matrix with all size(M,2)+1
    ind = sub2ind(size(Ord), rows, cols+1); % add 1 to the column indices
    Mask(ind) = Ord(ind);   % get the original indices of each repeated elements in each row
    vdup = min(Mask, [], 2); % get the minimum indices of each row, which is the indices of first repeated element
Fashandge
  • 237
  • 1
  • 12
  • 4
    On your updated solution I would point out that for v = [1 3 4 4 3 5] you get the answer 5 because the code doesn't return the _first_ repeated, it returns the _smallest_ repeated (because of sort) – Steve Jun 15 '12 at 20:23

4 Answers4

3

This will work. @Steve pointed out an error in your updated solution.

[~, ~, Iv] = unique(v, 'stable');
idx = find(diff(Iv)-1, 1)+1;
el = v(idx);

After this, el will contain the first repeated element in v and idx will be its index in v.

First you use stable unique to find the unique elements. The second output argument contains the original indices of each unique element. You then run diff(Iv) - 1 to find the jumps in the original indices. You use find(, 1) to get the first element and add one to get the index in the original vector. Index into the original vector to get the element you want.

Fashandge
  • 237
  • 1
  • 12
Ansari
  • 8,168
  • 2
  • 23
  • 34
  • +1 though I think you should break it into two steps, since the OP is also interested in getting the index as well as the value: `idx = find(diff(Iv)-1, 1)+1` then `el = v(idx)` – Amro Jun 15 '12 at 21:07
  • 2
    for those on older versions of MATLAB (`'stable'` option was introduced in R2012a), use `[~,Iv] = unique(v, 'first'); Iv = sort(Iv);` instead of the first line. See this [question](http://stackoverflow.com/q/3065387/97160) – Amro Jun 15 '12 at 21:18
  • Thanks! Is there a way to do similar operations to a matrix M? I want to get the first repetitive element for each row. – Fashandge Jun 15 '12 at 21:41
  • Yeah try `unique(A, 'rows', 'stable')` (or first instead of stable if you're on an older version of MATLAB like @Amro suggested) – Ansari Jun 15 '12 at 21:52
  • There is still a slight error for this boundary case: say v = [ 1 1 1], Iv=1, diff(Iv)=[], then your code doesn't work. A slight modification can work, just use the 3rd returned argument to replace the 2nd argument Iv. – Fashandge Jun 15 '12 at 21:52
  • unique(A, 'rows', 'stable') doesn't work, it treat the whole row as an element in sorting, and remove the duplicate "rows". – Fashandge Jun 15 '12 at 21:56
  • Oh sorry, I read your comment properly now. I guess to apply it to every row in a matrix you can use arrayfun. For that boundary case, I don't think the third argument will work – Ansari Jun 15 '12 at 22:18
  • I think arrayfun is just a syntactic sugar of for-loop, so it won't improve performance significantly, right? – Fashandge Jun 16 '12 at 00:57
  • Well it's vectorized, so the looping happens inside C, with whatever fun optimizations they have. There's no way to save on the computation, so you just optimize the way it's done. – Ansari Jun 16 '12 at 01:06
1

The answer @Fash originally proposed ALMOST works. Going further down his path:

sv = sort(v);
repeated = sv(~diff(sv));
ifr = find(ismember(v,repeated),'first');
ir2 = find(v==v(ifr));
index_desired = ir2(2);
value_desired = v(index_desired);
mwengler
  • 2,738
  • 1
  • 19
  • 32
  • Thank you very much! Your answer ALMOST works. Going further down your path, I get the solution even for returning the first repeated element of each row of a MATRIX! See my updated question. – Fashandge Jun 16 '12 at 00:58
1
idx = find(any(triu(bsxfun(@eq, v, v.'), 1)), 1);
el = v(idx);

How this works: bsxfun(...) compares each entry of v with each other. triu(...,1) keeps only matches with a previous element (keeps only values above the diagonal). any tells which entries have a match with some previous entry. find(...,1) gives the index of the first such entry.

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
-1

store in a hashtable which you can check to already have contents?

something like:

If (hash.hasValue(i))
  return true;
else
  hash.insert(i, 1);
  return false;

where i is the key, the position, and can contain just something simple, a bit for example to allow for a small structure size.

Fallenreaper
  • 10,222
  • 12
  • 66
  • 129