-1

I have combined a PHP script to count words in MySQL textfield and update another field accordingly.

It works well with relatively small tables - but when I tried with really big table (10M records) - of course I've got "PHP Fatal error: Allowed memory size of 134217728 bytes exhausted"

Could somebody hint how to modify the script below to process the data "row-by-row" ?

<?php
$con1 = mysqli_connect('localhost','USERNAME','PASSWORD','DATABASE');
if (!$con1) {
    die('Could not connect: ' . mysqli_error($con1));
}
$sql = "SELECT id FROM TableName";
$result = mysqli_query($con1, $sql);
while ($row = mysqli_fetch_assoc($result)) {
    $to = $row["id"];
    $sql1 = "SELECT textfield FROM TableName WHERE id = '$to' ";
    $result1 = mysqli_query($con1,$sql1);
    $row1 = mysqli_fetch_assoc($result1);
    $words=(str_word_count($row1['textfield'],0));
    $sql2="UPDATE TableName SET wordcount = '$words' WHERE id ='$to'";
    $result2 = mysqli_query($con1,$sql2);
}
mysqli_close($con1);
?>
Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
Serge
  • 679
  • 1
  • 9
  • 23
  • 1
    besides memory problems in the PHP code what you talked about.. @RiggsFolly the topicstarter should also research (second order) SQL injections and race conditions as this code also seams prone to these cases.. – Raymond Nijland Oct 12 '19 at 14:54
  • Gentlemen, pls, stop your counter jokes. Of course I run the script from the command line. It's very clear for me the code at line `$sql = "SELECT id FROM TableName";` tries to catch the whole table IDs - of course it exceeds the allocated memory. So the question is how to modify the script to process "id-by-id". – Serge Oct 12 '19 at 14:55
  • *"Gentlemen, pls, stop your counter jokes."*. " fair enough *"So the question is how to modify the script to process "id-by-id". "* As we don't know table structures or data within consider running it as a `UPDATEing JOIN` -> `UPDATE .... INNER JOIN ... SET ...` instead to run it on the MySQL server only... But still phantom reads (race conditions) can happen depending on the transaction isolation level or very bad performance when not indexed.. – Raymond Nijland Oct 12 '19 at 15:00
  • 1
    As we don't know table structures or data within .. See [Why should I provide a Minimal Reproducible Example for a very simple SQL query?](https://meta.stackoverflow.com/questions/333952/why-should-i-provide-a-minimal-reproducible-example-for-a-very-simple-sql-query) .. Also the `CREATE TABLE ` statement will also be usefull -> `SHOW CREATE TABLE ` .. – Raymond Nijland Oct 12 '19 at 15:01
  • 1
    You have only use one `TableName` are both the SELECTS on the same `Table`? – RiggsFolly Oct 12 '19 at 15:03
  • Yes, TableName is the same. As I wrote - such a script perfectly works with relatively small tables, tested successfully at 50 000 records. So the matter not about a table structure or data within. One field is ID, another is TEXTFIELD. So how I can "reproduce" minimal example ? This case a minimum example means whole 10M table. – Serge Oct 12 '19 at 15:06
  • *"So how I can "reproduce" minimal example ? This case a minimum example means whole 10M table."* with all respect but It is clear you didn't fully read this [linked document](https://meta.stackoverflow.com/questions/333952/why-should-i-provide-a-minimal-reproducible-example-for-a-very-simple-sql-query) -> *"It's easy to provide a sample data set. It doesn't have to be real data. It just has to fairly reflect the nature of the problem at hand. "* – Raymond Nijland Oct 12 '19 at 15:09
  • When we see a small data example (~15 records) and the indexes in the `CREATE TABLE` statement, we have a pretty good understanding what might happen in the MySQL engine when running this.. Also adding some [EXPLAIN ](https://dev.mysql.com/doc/refman/8.0/en/explain.html) for the main SELECT and those SELECT and UPDATE in the loop also would help us with that aswell.. – Raymond Nijland Oct 12 '19 at 15:15
  • Then why are there 2 SELECT queries, surely one would do! – RiggsFolly Oct 12 '19 at 15:16
  • Personally I would rewrite this within a transaction to select record in blocks of say 500 `LIMIT x,y` and that way I could also build-in a restart capability. When updating more that a few 50000 rows I always buildin a restart feature – RiggsFolly Oct 12 '19 at 15:19
  • @RiggsFolly using a large offset number for `x` in `LIMIT x,y` is not great as that can bring the MySQL Server performance wise on his knees .. – Raymond Nijland Oct 12 '19 at 15:22
  • Which one is extra SELECT ? one SELECT for id, another SELECT for TextFIELD. I suppose if somebody understands PHP better than me (as I'm not an expert at all) - he can just read the code and give a hint to a PHP CODING solution what I can try at place. There are thousands of such an answers at StackOverflow, I found many useful for me during last 5 years. Let me say again - it works perfectly with 50K records. So this is not a question how query is constructed because queries are very simple. It's not a MySQL question, it's the PHP question first of all. – Serge Oct 12 '19 at 15:24
  • Nobody can give a hint about the "unknown" as you are working with the SQL language this is also a SQL (MySQL) related question ... As SQL language is meant to be declarative in nature where you define what you want to have not how to get it.. So table structure and data for that matter can change the meaning of a SQL statement...Meaning this question is a fill in the blanks kind of question from our viewing point as we dont know table structure and data where you use SQL.. (not sure how to explain it more easy).. – Raymond Nijland Oct 12 '19 at 15:36
  • 1
    *"give a hint to a PHP CODING solution what I can try at place. "* But assuming you have a id column with a PRIMARY key you could try to batch process.. More or less the same suggestion which @RiggsFolly has given but then run `WHERE id >= 1 ORDER BY id ASC LIMIT 50000` second run in the loop should be `WHERE id >= 50001 ORDER BY id ASC LIMIT 50000` which should have (much) better performance then using large offset numbers in the `LIMIT` – Raymond Nijland Oct 12 '19 at 15:45
  • Yes, that's very near the solution ! - I tested the one - and no problem, it works ! - BUT again it process only first query. This is the PURE PHP question how to LOOP such a query within my script. Not a big difference - process ID-BY-ID - or SELECT with LIMITS your offered - the question is how to use PHP loop for both variants. If I could use MySQL in a BASH script I could just use `SELECT id FROM TableName | while read results DO...DONE` - but here is PHP...what is much faster than BASH or MySQL. This is why I mean it's the PHP question first of all. – Serge Oct 12 '19 at 16:01
  • So I don't understand who made the question downvoted...certainly not a PHP expert. – Serge Oct 12 '19 at 16:05
  • It is a very bad idea to use `die(mysqli_error($conn));` in your code, because it could potentially leak sensitive information. See this post for more explanation: [mysqli or die, does it have to die?](https://stackoverflow.com/a/15320411/1839439) – Dharman Oct 13 '19 at 10:54

1 Answers1

0

MySQL queries have the clause limit o, n, so you can run

SELECT id FROM TableName limit 0, 10

for example to get only 10 elements from the start. Now, the first number is the offset (the index where you start your work from) and the second is the number of elements you would expect to get. Now, these are the ideas you need to know in order to have success in doing this:

  • you will need to write a loop
  • in the loop you will always get n elements (n could be 1 as you wanted, or more)
  • in each step you increment o by n, so the new offset will be starting where the results ended previously
  • you can ensure an order, like order by id, for example
  • you can wrap the loop we are speaking about here around most of your code
Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175