-2

Let assume we have 2 tables:

table name: hids

hid status
001 2
002 1
003 1
004 2
005 2

... unique on hid

and: table name: times

hid uid time
001 00001 12345
001 00001 12567
001 00002 12540
001 00003 12541
001 00003 12567
002 00001 12575

... (no uniques)

The problem is: For a given user (eg. uid=00001) UPDATE status in "hids" with 0 if:

  • current status is 2
  • in the "times" table there isn't any record for any other user with time > (time of the latest entry for uid=00001 and the same hid)

Currently I do it with PHP is a way which is not too effecitve (thousends of atom queries). As the database grows over time (even for several milion records) the code is ineffective due to PHP overhead. Is there any option to make it simpler?

Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
Gajowy
  • 23
  • 3
  • 1
    **Warning:** You are wide open to [SQL Injections](https://php.net/manual/en/security.database.sql-injection.php) and should use parameterized **prepared statements** instead of manually building your queries. They are provided by [PDO](https://php.net/manual/pdo.prepared-statements.php) or by [MySQLi](https://php.net/manual/mysqli.quickstart.prepared-statements.php). Never trust any kind of input! Even when your queries are executed only by trusted users, [you are still in risk of corrupting your data](http://bobby-tables.com/). [Escaping is not enough!](https://stackoverflow.com/q/32391315) – Dharman Feb 01 '23 at 11:42
  • 1
    You really should not be using `multi_query()`. It's a terrible function that will cause you a lot of problems. Please learn how to do it properly with prepared statements. – Dharman Feb 01 '23 at 11:44
  • 1
    Use `WHERE ... AND EXISTS (subselect to check your "times" table conditions)` in your UPDATE statement ...? Or NOT EXISTS, depending on how you want to phrase the subselect. – CBroe Feb 01 '23 at 11:58
  • @Dharman well judging by the code, it seems they know what they're doing. Not very pedagogical but technically it works. – Your Common Sense Feb 01 '23 at 12:09
  • I second CBroe, it could be a single atomic update with join. I would recommend to re-ask this question, without any PHP, leaving only table definitions and your attempted SQL solution – Your Common Sense Feb 01 '23 at 12:55
  • @Dharman - Thank you for the warning. It's never too much to reming about this rule, however in this case there is no risk - substituted values comes from the database and are all numerical fields (in the tables and in the substitution - see '%d' in spritnf). I use parametrization in other part of the code, however have some problems when it is a multiquery ($updQry in the code) - but this is another issue I still belive to solve by myself. – Gajowy Feb 01 '23 at 14:05

1 Answers1

1

As noted in the comments, you should make the switch to using parameterized prepared statements. Given that you are currently using sprintf to inject your variables into your SQL, it will be a very small change.


You could significantly reduce the overhead of your current process by only returning the maximum time per hid for the given user -

SELECT times.hid, MAX(times.time) AS max_time
FROM times
JOIN hids ON times.hid = hids.hid
WHERE times.uid = 1
AND hids.status = 2
GROUP BY times.hid;

But a better option is to JOIN hids and times and then use NOT EXISTS (or LEFT JOIN IS NULL) to find where there is no other user with a greater time -

UPDATE hids h
JOIN times t ON h.hid = t.hid
SET h.status = 0
WHERE t.uid = 1
AND h.status = 2
AND NOT EXISTS (
    SELECT 1 FROM times WHERE hid = t.hid AND uid <> t.uid AND time > t.time
)

Depending on the distribution of your data and how it is indexed you will probably get better performance by pre-calculating the max time per hid -

UPDATE hids h
JOIN (
    SELECT t.hid, t.uid, MAX(time) AS max_time
    FROM hids h
    JOIN times t ON h.hid = t.hid
    WHERE t.uid = 1
    AND h.status = 2
    GROUP BY t.hid, t.uid
) t ON h.hid = t.hid
SET h.status = 0
WHERE NOT EXISTS (
    SELECT 1 FROM times WHERE hid = t.hid AND uid <> t.uid AND time > t.max_time
);
user1191247
  • 10,808
  • 2
  • 22
  • 32
  • The second code works smoothly and is probably the optimal one. It is because the SELECT 1 ... substatement evaluates to NOT EXISTS in most cases. Just had to add index for hid, uid and time. Without indexes it takes app. 30 minutes for my current enviroment, and with... 0.4 seconds ;-). **That's great, thank you!** – Gajowy Feb 01 '23 at 15:37