I am looking at ideas for building an ETL pipeline to populate PostgreSQL tables. I expect to be loading tens of thousands of rows in a batch, so not huge amounts of data, but big enough to think through before implementing anything.
I expect that in the vast majority of cases, the incoming data will be clean and should insert with no issue. However, I would not be surprised if every now and then I get one or two records in a batch with a problem: perhaps a duplicate key, or a char field longer than expected, or a missing field in a non-null column. That type of thing.
When this happens, I would like all of the non-erroring rows to be inserted successfully and my code to somehow determine which row(s) failed.
I can do this by inserting each record in a separate transaction, but that would be very inefficient for the vast majority of cases, where all records are expected to insert cleanly.
What tools and strategies exist to accomplish these goals?