1

I have a big 2D array (576,000 X 4), and huge database (millions records and 10 columns, its size is in Gigabytes). The array, of course, is much smaller than the number of records in the database.

I need some effective way to compare the 2D array to the database, and delete the equal lines from the 2D array only.

Does anyone have an idea how could i apply it efficiently? The speed is very important to me.


I tried to apply it like that:

$query = mysqli_query($config, "SELECT * FROM sec ") or die(mysql_error());
while ($row = mysqli_fetch_array($query) ) {
    if ( isset($arr[$row['CHROM']][$row['POS']]) ) {
        // delete line from the 2D array
    }
}

But, i don't know how efficient it is, because i tried it just on small database, and it makes me load all the records of the database to the PHP page, and it creates a memory problem.

Another way that i check is this:

foreach ($arr as $chr=>$v) {
    foreach ($v as $pos=>$val) {
        $query = mysqli_query($config, "SELECT * FROM sec WHERE CHROM='$chr' && POS='$pos' ") or die(mysql_error());
        if (mysqli_num_rows($query) > 0) {
            // delete line from the 2D array
        }
    }
}

But, its not a good solution, because it took too much time.


edit:

my sec table looks like that:

enter image description here

the call to a item from the 2D array looks like that $arr[some_CHAROM][some_POS]

if the some_CHAROM equal to some CHAROM in the database AND some_POS equal to the POS in the same line, we have a match.


i build the 2D array from a file that the user upload to the website. and im not load it to the mySql.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
ldoroni
  • 629
  • 1
  • 7
  • 18
  • 4
    I think T-SQL and MySQL are mutually exclusive. – GolezTrol Dec 26 '14 at 13:33
  • You can't use tsql with a MySQL database, do you mean you have a MSSQL (eg SQL Server) database? – Mark Baker Dec 26 '14 at 13:34
  • Just to clarify, you are trying to compare records in a large database with what is in your 2D array, right? – Mr. Concolato Dec 26 '14 at 13:41
  • @GolezTrol good to know that. i try to get some idea's from google, and t-sql is one of the things that i see there, that i was thinking to mysely that could work. – ldoroni Dec 26 '14 at 13:53
  • Can you please show your `sec` table definition? Also, can you please explain in more detail how you know an item in your array matches an item in your `sec` table? – O. Jones Dec 26 '14 at 13:54
  • 2
    Where does your large array come from? If that comes from the database as well, then it's probably easier to do your comparison directly there – Mark Baker Dec 26 '14 at 13:56
  • Is it an one-time operation or do you need to run it again and again (as part of a lengthier processing or while processing a page request)? I'm asking because there are different strategies for these two cases. – axiac Dec 26 '14 at 14:06
  • @OllieJones i edit the post and add the things that you ask – ldoroni Dec 26 '14 at 14:14
  • @MarkBaker the user upload a file to my website, and i build the array from this file. im not upload the information that in this file to the mySql. – ldoroni Dec 26 '14 at 14:16
  • @axiac its a one-time operation. the user upload a big file to my website, and it processes the information, and return it. – ldoroni Dec 26 '14 at 14:18
  • Grumble. The questioner completely changed the question after I wrote most of an answer. Sigh. My answer was worthless. Pro tip: Don't use `SELECT *`. Name the columns you need. – O. Jones Dec 26 '14 at 14:30
  • @OllieJones the question allways was the same. at the editing i just add the information that you asked for – ldoroni Dec 26 '14 at 14:41

3 Answers3

3

The algorithm:

  1. convert the file uploaded by the user into a CSV file (if not already in this format); this is a simple task that can be done in several lines of PHP code; see function fputcsv();
  2. create a buffer table: tbl1;
  3. use LOAD DATA LOCAL INFILE to load the content of the (local) CSV file into the buffer table tbl1;
  4. use:

    DELETE tbl1
    FROM tbl1
        INNER JOIN tbl2 on tbl1.id = tbl2.id
    

    to delete from table tbl1 the rows that have matches in table tbl2. I assumed the match field is named id on both tables; change it to match your design;

  5. fetch the data from table tbl1, format it as you wish, send it to the browser;
  6. cleanup: DROP TABLE tbl1;

Because the script processes a file uploaded by an user, in order to avoid any concurrency issue you need to generate for the buffer table an unique name for each user. You can use a prefix and append the userId to it to avoid two users using the same table on the same time.

axiac
  • 68,258
  • 9
  • 99
  • 134
  • It might be worthwile to build an index on tbl1.id (assuming that there's already one on tb2.id!) +1 – STT LCU Mar 06 '15 at 15:10
  • An index on `tbl2.id` is required to help the query run as fast as it can. An index on `tbl1.id` is not really needed but it could help, indeed, if `tbl1` has many columns. Because there is no `WHERE` clause, all the rows of the table on the left side of the `JOIN` are analyzed. MySQL's query planner will put on the table having less rows the left when analyze the query (`tbl1` has less rows here). An index on `tbl1.id` helps indeed; it allows MySQL to read the values of `tbl1.id` from the index instead of reading the table data -> less bytes to load from the storage -> faster execution. – axiac Mar 06 '15 at 16:11
0

From your code snippet

foreach ($arr as $chr=>$v) {
    foreach ($v as $pos=>$val) {
        $query = mysqli_query($config, "SELECT * FROM sec WHERE CHROM='$chr' && POS='$pos' ") or die(mysql_error());
        if (mysqli_num_rows($query) > 0) {
            // delete line from the 2D array
        }
    }
}

I assume, that you want to delete based on $chr and $pos.

So, you could do the following: Assemble a single query to rule them all* :)

$ors = array();
foreach ($arr as $chr=>$v) {
    foreach ($v as $pos=>$val) {
        $ors[] = "CHROM='$chr' AND POS='$pos'";
    }
}

$deleteConditions = "(" . implode(") OR (", $ors) . ")":
$query = mysqli_query($config, "DELETE FROM sec WHERE " . $deleteConditions);

Untested, but this should give you a single query, like

DELETE FROM 
  sec 
WHERE 
  (CHROM='1' AND POS='2') OR 
  (CHROM='3' AND POS='4') OR 
  (CHROM='5' AND POS='6') OR 
  ...

depending on what $chr and $pos are.

*As Ollie Jones noted in the comments: Take care of the overall query length. If required, create a second, third, ... query until you processed all items in appropriate batches.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
dognose
  • 20,360
  • 9
  • 61
  • 107
  • 1
    You're gonna get slammed by `max_allowed_packet` if you try this. Read this: http://stackoverflow.com/questions/16335011/what-is-maximum-query-size-for-mysql – O. Jones Dec 26 '14 at 14:01
  • @OllieJones ofc. this has to be considered. But to delete 10.000 rows, you better delete in batches, than firing single queries. – dognose Dec 26 '14 at 14:02
  • im not delete the lines from the mySql, i delete the lines JUST from the 2D array – ldoroni Dec 26 '14 at 14:20
  • @Idoroni, that changes a lot... (I leave the answer, so people can see this statement) For me, your question was "how to delete database rows that are IN the 2D array." – dognose Dec 26 '14 at 14:24
0

Try following code

$servername = "localhost";
$username = "root";
$password = "";
$dbname = "drupal7";


mysql_connect($servername, $username, $password );
mysql_select_db($dbname);




$sql = "SHOW TABLES FROM $dbname";
$result = mysql_query($sql);

if (!$result) {
    echo "DB Error, could not list tables\n";
    echo 'MySQL Error: ' . mysql_error();
    exit;
}
$database1=array();
while ($row = mysql_fetch_row($result)) {

    $result1 = mysql_query("SELECT * FROM ".$row[0]);
    if(mysql_num_rows($result1)){
        $num_rows = mysql_num_rows($result1);
    // echo "Table: {$row[0]} ==>".$num_rows."<br>";
        $database1[$row[0]]=$num_rows;
    }

    // }
}
echo '<pre>';
print_r($database1);

mysql_free_result($result);
// mysql_close();

$dbname='drupal71';
mysql_select_db($dbname);


$sql = "SHOW TABLES FROM $dbname";
$result = mysql_query($sql);

if (!$result) {
    echo "DB Error, could not list tables\n";
    echo 'MySQL Error: ' . mysql_error();
    exit;
}
$database2=array();
while ($row = mysql_fetch_row($result)) {

    $result1 = mysql_query("SELECT * FROM ".$row[0]);
    if(mysql_num_rows($result1)){
        $num_rows = mysql_num_rows($result1);
    // echo "Table: {$row[0]} ==>".$num_rows."<br>";
        $database2[$row[0]]=$num_rows;
    }

    // }
}


print_r($database2);


$test = array_diff($database1, $database2);

print_r($test);die;
Arun Kushwaha
  • 1,136
  • 1
  • 8
  • 8