3

To find the intersection of N arrays I have this implementation, which is horribly inefficient. I know there has to be an algorithm out there to speed this up.

note: myarray is the array containing all my other arrays for which I want to find the intersection for.

var
i, j, k: integer;
myarray: Array of Array of integer;
intersection: array of integer;

for I := 0 to length(myarray)-1 do
  begin
    for J := 0 to length(myarray)-1 do
    begin
      if i = j then
        continue;
      for k := 0 to length(myarray[i])-1 do
      begin
        if myarray[i][j] = myarray[j][k] then
        begin
          setLength(intersection, length(intersection)+1);
          intersection[length(intersection)-1] := myarray[j][k];
        end;
      end;
    end;
  end;

What optimization can I apply to speed this up? Is there a faster way of doing this?

EDIT: Data in arrays are unsorted.

Daisetsu
  • 4,846
  • 11
  • 50
  • 70
  • 1
    I don't see how this is valid code in the first place. You use `j` to index into the outer array *and* the inner array in the same expression. That's only valid if all the integer arrays are guaranteed to be the same length as the *number* of integer arrays you have. Also, are the arrays' contents in a predictable and consistent order (sorted)? – Rob Kennedy Jan 27 '12 at 19:22
  • Many optimizations depend on the nature of the array data. If they are sorted then you can replace the linear search with a binary search for instance. – Kenneth Cochran Jan 27 '12 at 19:35
  • I just looked at the code and realized my implementation won't work. I hadn't actually compiled it yet. – Daisetsu Jan 27 '12 at 19:46
  • 3
    That's quite a howler, Daisetsu! All the speed in the world isn't worth a thing if you don't yet have correctness nailed down. – Rob Kennedy Jan 27 '12 at 20:34

3 Answers3

10

There is a faster way: the list comparison algorithm. It allows you to compare two lists in linear time instead of quadratic time. Here's the basic idea:

  1. Sort both lists by the same criteria. (Make copies of the lists first, if you need to preserve the original ordering.)
  2. Start at the top of both lists. Pick the first item from each and compare them.
  3. If they match, handle the case and advance the index for both lists.
  4. If they don’t match, loop through, advancing the index for the list with the “lesser” value each time, until a match is found.
  5. When you reach the end of either list, you’re done. (Unless you want to handle any leftovers from the other list.)

This can be extended to deal with more than 2 lists with a bit of effort.

Mason Wheeler
  • 82,511
  • 50
  • 270
  • 477
5

Unfortunately, you have not updated your question yet, so it still is not exactly clear what you are asking. E.g. you talk about an intersection (which should search for values that exist in every single array), but from the (not working) code it seems you are simply searching for duplicates in any of the arrays.

Although Mason's answer points to an obvious general solution for these kind of algorithms, I believe it is somewhat different for such a multi-dimensional array. I worked out two routines for determination of (1) the intersection as well as (2) the duplicates. Both assume unordered content of unequal length in the arrays.

First, I decided to introduce some new types:

type
  PChain = ^TChain;
  TChain = array of Integer;
  TChains = array of TChain;

Secondly, both routines need some sorting mechanism. A very quick but dirty one is done by employing/misusing a TList:

function CompareInteger(Item1, Item2: Pointer): Integer;
begin
  Result := Integer(Item1) - Integer(Item2);
end;

procedure SortChain(var Chain: TChain);
var
  List: TList;
begin
  List := TList.Create;
  try
    List.Count := Length(Chain);
    Move(Chain[0], List.List[0], List.Count * SizeOf(Integer));
    List.Sort(CompareInteger);
    Move(List.List[0], Chain[0], List.Count * SizeOf(Integer));
  finally
    List.Free;
  end;
end;

But a much nicer implementation is gotten by adjusting the RTL code from Classes.QuickSort, which does exactly the same as the one above, without copying the array (twice):

procedure SortChain(Chain: PChain; L, R: Integer);
var
  I: Integer;
  J: Integer;
  Value: Integer;
  Temp: Integer;
begin
  repeat
    I := L;
    J := R;
    Value := Chain^[(L + R) shr 1];
    repeat
      while Chain^[I] < Value do
        Inc(I);
      while Chain^[J] > Value do
        Dec(J);
      if I <= J then
      begin
        Temp := Chain^[I];
        Chain^[I] := Chain^[J];
        Chain^[J] := Temp;
        Inc(I);
        Dec(J);
      end;
    until I > J;
    if L < J then
      SortChain(Chain, L, J);
    L := I;
  until I >= R;
end;

Intersection:

To obtain the intersection of all arrays, comparing all values in the shortest array with the values in all other arrays is enough. Because the shortest array may contain duplicate values, that small array is sorted in order to be able to ignore the duplicates. From that point it is simply a matter of finding (or rather nót finding) a same value in one of the other arrays. Sorting all other arrays is not necessary, because the chance to find a value at an earlier position than within a sorted array is 50%.

function GetChainsIntersection(const Chains: TChains): TChain;
var
  IShortest: Integer;
  I: Integer;
  J: Integer;
  K: Integer;
  Value: Integer;
  Found: Boolean;
  FindCount: Integer;
begin
  // Determine which of the chains is the shortest
  IShortest := 0;
  for I := 1 to Length(Chains) - 1 do
    if Length(Chains[I]) < Length(Chains[IShortest]) then
      IShortest := I;
  // The length of result will at maximum be the length of the shortest chain
  SetLength(Result, Length(Chains[IShortest]));
  Value := 0;
  FindCount := 0;
  // Find for every value in the shortest chain...
  SortChain(@Chains[IShortest], 0, Length(Chains[IShortest]) - 1);
  for K := 0 to Length(Chains[IShortest]) - 1 do
  begin
    if (K > 0) and (Chains[IShortest, K] = Value) then
      Continue;
    Value := Chains[IShortest, K];
    Found := False;
    for I := 0 to Length(Chains) - 1 do
      if I <> IShortest then
      begin
        Found := False;
        for J := 0 to Length(Chains[I]) - 1 do
          // ... the same value in other chains
          if Chains[I, J] = Value then
          begin
            Found := True;
            Break;
          end;
        if not Found then
          Break;
      end;
    // Add a found value to the result
    if Found then
    begin
      Result[FindCount] := Value;
      Inc(FindCount);
    end;
  end;
  // Truncate the length of result to the actual number of found values
  SetLength(Result, FindCount);
end;

Duplicates:

This also does not require sorting all arrays individually. All values are copied into a one-dimensional temporary array. After sorting thát array, it is easy to find the duplicates.

function GetDuplicateShackles(const Chains: TChains): TChain;
var
  Count: Integer;
  I: Integer;
  Temp: TChain;
  PrevValue: Integer;
begin
  // Foresee no result
  SetLength(Result, 0);
  // Count the total number of values
  Count := 0;
  for I := 0 to Length(Chains) - 1 do
    Inc(Count, Length(Chains[I]));
  if Count > 0 then
  begin
    // Copy all values to a temporary chain...
    SetLength(Temp, Count);
    Count := 0;
    for I := 0 to Length(Chains) - 1 do
    begin
      Move(Chains[I][0], Temp[Count], Length(Chains[I]) * SizeOf(Integer));
      Inc(Count, Length(Chains[I]));
    end;
    // Sort the temporary chain
    SortChain(@Temp, 0, Count - 1);
    // Find all duplicate values in the temporary chain
    SetLength(Result, Count);
    Count := 0;
    PrevValue := Temp[0];
    for I := 1 to Length(Temp) - 1 do
    begin
      if (Temp[I] = PrevValue) and
        ((Count = 0) or (Temp[I] <> Result[Count - 1])) then
      begin
        Result[Count] := PrevValue;
        Inc(Count);
      end;
      PrevValue := Temp[I];
    end;
    SetLength(Result, Count);
  end;
end;

Sample application:

And because I like to test all my code, it needed very little work to make it somewhat representative.

unit Unit1;

interface

uses
  SysUtils, Classes, Controls, Forms, StdCtrls, Grids;

type
  PChain = ^TChain;
  TChain = array of Integer;
  TChains = array of TChain;

  TForm1 = class(TForm)
    Grid: TStringGrid;
    IntersectionFullButton: TButton;
    IntersectionPartialButton: TButton;
    DuplicatesFullButton: TButton;
    DuplicatesPartialButton: TButton;
    Memo: TMemo;
    procedure FormCreate(Sender: TObject);
    procedure IntersectionButtonClick(Sender: TObject);
    procedure DuplicatesButtonClick(Sender: TObject);
  private
    procedure ClearGrid;
    procedure ShowChains(const Chains: TChains);
    procedure ShowChain(const Chain: TChain; const Title: String);
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

const
  MaxDepth = 20;

procedure FillChains(var Chains: TChains; FillUp: Boolean; MaxValue: Integer);
var
  X: Integer;
  Y: Integer;
  Depth: Integer;
begin
  SetLength(Chains, MaxDepth);
  for X := 0 to MaxDepth - 1 do
  begin
    if FillUp then
      Depth := MaxDepth
    else
      Depth := Random(MaxDepth - 2) + 3; // Minimum depth = 3
    SetLength(Chains[X], Depth);
    for Y := 0 to Depth - 1 do
      Chains[X, Y] := Random(MaxValue);
  end;
end;

procedure SortChain(Chain: PChain; L, R: Integer);
var
  I: Integer;
  J: Integer;
  Value: Integer;
  Temp: Integer;
begin
  repeat
    I := L;
    J := R;
    Value := Chain^[(L + R) shr 1];
    repeat
      while Chain^[I] < Value do
        Inc(I);
      while Chain^[J] > Value do
        Dec(J);
      if I <= J then
      begin
        Temp := Chain^[I];
        Chain^[I] := Chain^[J];
        Chain^[J] := Temp;
        Inc(I);
        Dec(J);
      end;
    until I > J;
    if L < J then
      SortChain(Chain, L, J);
    L := I;
  until I >= R;
end;

function GetChainsIntersection(const Chains: TChains): TChain;
var
  IShortest: Integer;
  I: Integer;
  J: Integer;
  K: Integer;
  Value: Integer;
  Found: Boolean;
  FindCount: Integer;
begin
  IShortest := 0;
  for I := 1 to Length(Chains) - 1 do
    if Length(Chains[I]) < Length(Chains[IShortest]) then
      IShortest := I;
  SetLength(Result, Length(Chains[IShortest]));
  Value := 0;
  FindCount := 0;
  SortChain(@Chains[IShortest], 0, Length(Chains[IShortest]) - 1);
  for K := 0 to Length(Chains[IShortest]) - 1 do
  begin
    if (K > 0) and (Chains[IShortest, K] = Value) then
      Continue;
    Value := Chains[IShortest, K];
    Found := False;
    for I := 0 to Length(Chains) - 1 do
      if I <> IShortest then
      begin
        Found := False;
        for J := 0 to Length(Chains[I]) - 1 do
          if Chains[I, J] = Value then
          begin
            Found := True;
            Break;
          end;
        if not Found then
          Break;
      end;
    if Found then
    begin
      Result[FindCount] := Value;
      Inc(FindCount);
    end;
  end;
  SetLength(Result, FindCount);
end;

function GetDuplicateShackles(const Chains: TChains): TChain;
var
  Count: Integer;
  I: Integer;
  Temp: TChain;
  PrevValue: Integer;
begin
  SetLength(Result, 0);
  Count := 0;
  for I := 0 to Length(Chains) - 1 do
    Inc(Count, Length(Chains[I]));
  if Count > 0 then
  begin
    SetLength(Temp, Count);
    Count := 0;
    for I := 0 to Length(Chains) - 1 do
    begin
      Move(Chains[I][0], Temp[Count], Length(Chains[I]) * SizeOf(Integer));
      Inc(Count, Length(Chains[I]));
    end;
    SortChain(@Temp, 0, Count - 1);
    SetLength(Result, Count);
    Count := 0;
    PrevValue := Temp[0];
    for I := 1 to Length(Temp) - 1 do
    begin
      if (Temp[I] = PrevValue) and
        ((Count = 0) or (Temp[I] <> Result[Count - 1])) then
      begin
        Result[Count] := PrevValue;
        Inc(Count);
      end;
      PrevValue := Temp[I];
    end;
    SetLength(Result, Count);
  end;
end;

{ TForm1 }

procedure TForm1.FormCreate(Sender: TObject);
begin
  Grid.ColCount := MaxDepth;
  Grid.RowCount := MaxDepth;
end;

procedure TForm1.ClearGrid;
var
  I: Integer;
begin
  for I := 0 to Grid.ColCount - 1 do
    Grid.Cols[I].Text := '';
end;

procedure TForm1.ShowChains(const Chains: TChains);
var
  I: Integer;
  J: Integer;
begin
  for I := 0 to Length(Chains) - 1 do
    for J := 0 to Length(Chains[I]) - 1 do
      Grid.Cells[I, J] := IntToStr(Chains[I, J]);
end;

procedure TForm1.ShowChain(const Chain: TChain; const Title: String);
var
  I: Integer;
begin
  if Length(Chain) = 0 then
    Memo.Lines.Add('No ' + Title)
  else
  begin
    Memo.Lines.Add(Title + ':');
    for I := 0 to Length(Chain) - 1 do
      Memo.Lines.Add(IntToStr(Chain[I]));
  end;
end;

procedure TForm1.IntersectionButtonClick(Sender: TObject);
var
  FillUp: Boolean;
  Chains: TChains;
  Chain: TChain;
begin
  ClearGrid;
  Memo.Clear;
  FillUp := Sender = IntersectionFullButton;
  if FillUp then
    FillChains(Chains, True, 8)
  else
    FillChains(Chains, False, 4);
  ShowChains(Chains);
  Chain := GetChainsIntersection(Chains);
  ShowChain(Chain, 'Intersection');
end;

procedure TForm1.DuplicatesButtonClick(Sender: TObject);
var
  Chains: TChains;
  Chain: TChain;
begin
  ClearGrid;
  Memo.Clear;
  FillChains(Chains, Sender = DuplicatesFullButton, 900);
  ShowChains(Chains);
  Chain := GetDuplicateShackles(Chains);
  ShowChain(Chain, 'Duplicates');
end;

initialization
  Randomize;

end.

Unit1.DFM:

object Form1: TForm1
  Left = 343
  Top = 429
  Width = 822
  Height = 459
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  OnCreate = FormCreate
  DesignSize = (
    806
    423)
  PixelsPerInch = 96
  TextHeight = 13
  object Memo: TMemo
    Left = 511
    Top = 63
    Width = 295
    Height = 360
    Anchors = [akLeft, akTop, akRight, akBottom]
    ScrollBars = ssVertical
    TabOrder = 5
  end
  object IntersectionFullButton: TButton
    Left = 511
    Top = 7
    Width = 141
    Height = 25
    Caption = 'Intersection (full chains)'
    TabOrder = 1
    OnClick = IntersectionButtonClick
  end
  object Grid: TStringGrid
    Left = 0
    Top = 0
    Width = 503
    Height = 423
    Align = alLeft
    ColCount = 20
    DefaultColWidth = 24
    DefaultRowHeight = 20
    FixedCols = 0
    RowCount = 20
    FixedRows = 0
    TabOrder = 0
  end
  object DuplicatesFullButton: TButton
    Left = 658
    Top = 7
    Width = 141
    Height = 25
    Caption = 'Duplicates (full chains)'
    TabOrder = 3
    OnClick = DuplicatesButtonClick
  end
  object IntersectionPartialButton: TButton
    Left = 511
    Top = 35
    Width = 141
    Height = 25
    Caption = 'Intersection (partial chains)'
    TabOrder = 2
    OnClick = IntersectionButtonClick
  end
  object DuplicatesPartialButton: TButton
    Left = 658
    Top = 35
    Width = 141
    Height = 25
    Caption = 'Duplicates (partial chains)'
    TabOrder = 4
    OnClick = DuplicatesButtonClick
  end
end
Community
  • 1
  • 1
NGLN
  • 43,011
  • 8
  • 105
  • 200
1
if myarray[i][j] = myarray[j][k] then

Shouldn't that be

if myarray[i][k] = myarray[j][k] then

?

Anyway, the most obvious, simple optimization you can make to this code is changing this

for I := 0 to length(myarray)-1 do
  begin
    for J := 0 to length(myarray)-1 do
    begin
      if i = j then
        continue;

into this

for I := 0 to length(myarray)-1 do
  begin
    for J := I+1 to length(myarray)-1 do
    begin

My next step would be to get rid of the outer index expressions in the inner loop:

if myarray[i][j] = myarray[j][k] then

In the I and J loops, create pointers to two arrays of integers, then do

for I := 0 to length(myarray)-1 do
  begin
    pia := @myarray[i];
    for J := I+1 to length(myarray)-1 do
    begin
      pja := @myarray[j];

Then in the inner loop you can do

if pia^[j] = pja^[k] then