2

I have two multidimensional arrays, both with email address data, but different keys, that I'm trying to compare. I need to get all of the email addresses that don't exist in both arrays, and store them. Is there a way I can speed this up?

$accts: id, Email, column1 - 13000 rows

$db_accts: id1, name, accountID,Email_Address__c - 17000 rows

Right now I have

foreach($accts as $acct){
    $exists = false;
    foreach($db_accts as $db_acct){
        if($acct['Email'] == $db_acct['Email_Address__c'])
        {   $exists = true;}
    }

    if(!$exists)
    {   $update[] = array('Email'=>$acct['Email'],'column1'=>'');
    }

}

foreach($db_accts as $db_acct){
    $exists = false;
    foreach($accts as $acct){
        if($acct['Email'] == $db_acct['Email_Address__c'])
            $exists = true; 
    }
    if(!$exists)
    {
        $update[] = array('Email'=>$db_acct['Email_Address__c'],'column1'=>'Y');
    }
}
aynber
  • 22,380
  • 8
  • 50
  • 63
  • 1
    Yes. One is from a local DB, one from a remote server. SQL queries won't help me unless I load up a temporary table. – aynber Jan 08 '13 at 19:20
  • well if the end destination is a db, it still may be best handled with (my)sql –  Jan 08 '13 at 19:21
  • 1
    It's actually going into a 3rd db. I'm comparing data from company1 to my data, then uploading it to company2. Data from both companies is accessed via SOAP/REST. – aynber Jan 08 '13 at 19:23
  • well you can just inset one lot in to the db, then the other, with email address as a unique key, the db will take care of duplicates. –  Jan 08 '13 at 19:24
  • That would only load all of the email address in the database. It wouldn't tell me which addresses don't exist in both. – aynber Jan 08 '13 at 19:26
  • on each insert you check, if it was rejected as a duplicate, so you can get a list of which where duplicates, and which where not. –  Jan 08 '13 at 19:28

5 Answers5

1

By hand

Algorithmically, what you are doing is O(|accts| * |db_accts|).

You could sort both accounts by email, which requires O(|accts| * log(|accts|)) + O(|db_accts| * log(|db_accts|)). Then, you could

  1. start with the first element of both,
  2. compare
  3. if the element of accts is bigger, store it and iterate to the next element in accts
  4. likewise if the element of db_accts is bigger
  5. if both are the same, iterate both without storing.
  6. goto 3. if there are still elements

The comparison should take O(|accts| + |db_accts|). If you get them sorted by email from the DBs, that would omit the sort-step above. Otherwise, it is O(n log(n)), which is way smaller than O(n^2).

This should result in a speedup of about 1000 in your dataset (log(17000) vs 13000).

For sorting in PHP, have a look at Sort Multi-dimensional Array by Value.

For SQL, see MySQL "Group By" and "Order By".

By PHP

You can use array_diff() with multidimensional arrays.

Community
  • 1
  • 1
serv-inc
  • 35,772
  • 9
  • 166
  • 188
0

You could probably speed this up using array_diff: http://de3.php.net/array_diff

mpaepper
  • 3,952
  • 3
  • 21
  • 28
0

Use array_diff().

See the documentation, the example is really quite clear. You input two arrays and array_diff() spits out the ones that are different. In your case emails that are not contained in both arrays.

phpisuber01
  • 7,585
  • 3
  • 22
  • 26
0

Using your own PHP code is much slower than using a standard PHP function. The standard functions of PHP run on a lower level and therefore have less overhead. Using a function such as array_diff will be much more efficient.

tvkanters
  • 3,519
  • 4
  • 29
  • 45
  • While this won't work with the arrays as I have them (I don't see anything about multidimensional arrays, comparing on different keys, etc), I suppose I can create new arrays with just the email addresses and compare those instead. – aynber Jan 08 '13 at 19:23
0

You should use array_udiff as it allows you to customize the comparison:

array_udiff($accts,$db_accts,function($act,$dbact){
     return strcasecmp($act['Email'], $db_act['Email_Address__c']);
 });
artragis
  • 3,677
  • 1
  • 18
  • 30