I don't want the database to get to big and unwieldy
Database scaling is big topic, but it still fits in the realm of optimization, which can be summed up with three simple rules:
- Don't
- Don't
- (Experts Only) Profile First!
What this means for your question is that you probably shouldn't be optimizing for the size of your data until you have a good idea of
- How much data do you really have?
- What are the queries you execute regularly on that data, which queries are slow?
- What can your database do natively to help?
What might seem at first blush to be a lot of data is often nothing to worry about. A good rule of thumb, if your dataset fits in memory, you don't have a big dataset.
Even if you do have a big dataset, it's often the case that only a small part of it is relevant, (the non "completed" rows) really affect queries. You can make that work well just by creating the right combination of indexes, so that your database can easily find and operate on the rows that you actually query.
And it might be that you are using a database for the wrong thing. What you are describing, some data comes in, hangs around until it gets processed, and then gets archived, sounds suspiciously similar to a queue. Persistent and distributed queues are widely available (Have a look at celery for a python framework built on queuing) and may be a better fit for the problem you are trying to solve.