I have a SQL table that is filled with pending orders on day one. On the next day, at a specific time, I have to fetch the 'PENDING' orders from DB and process each of them by calling an external API. This is what my code flow looks like:
SELECT * FROM orders where status = 'PENDING' LIMIT 200
Now I will call the external API for each of those orders which will either return success or failure for each order and then I'll update the order status in DB.
UPDATE orders SET status = 'COMPLETED' WHERE id = ANY(<success list>)
UPDATE orders SET status = 'FAILED' WHERE id = ANY(<failure list>)
The above flow will continue to run multiple times until the select query returns 0 rows. I've put a LIMIT in the query in order to avoid memory issues and the external API's throughput capability.
Now there are a couple of issues with the above flow:
- Let's say my code executed the SELECT query and started processing the orders. What if my service gets crashed in between? There will be some orders which would have gone through the API and would've received passed or failed responses. But I missed updating their status in DB, therefore when my service will start again, it will again pick those orders and process them again which I don't want.
- My service can be running from multiple instances, therefore same orders whose status = 'PENDING' can be picked by various instances leading to double processing of the same order. How to avoid this?
If it helps, my tech stack is Go and PostgreSQL. I am sure the above are some common issues and there must be some standard ways to approach them. I am open to changing any part whether it's the Go code or DB change which may include locks or transactions. I just want which direction to look for the solution. Any help would be appreciated.