I'm working on maintaining a database which stores data transfer information between different networks. Essentially, each data transfer is logged and at the end of each month I run a perl script that loads the log files into a table in the database. I did not design the perl script or the database schema. It was done before I started working on this project.
I used this link to retrieve the primary keys of the table (usage_detail is the name of the table) and it gave me nothing. Since, there are so many records in the table, its not very easy to keep track of duplicates. We've had problems where a lot of duplicates were loaded (because of bugs script that does the logging for each transfer but thats for another topic) and ended up having to drop the latest load and reload all the new ones after fixing the log files. As you may have guessed this is stupid and tedious.
To fix this, I would like to add a primary key to the table. Due to several reasons, we don't want to add an entire new column for the primary keys. After looking at the fields, I've figured out a multi-column primary key. Basically it consists of: transfer start timestamp, transfer end timestamp, name of file transferred (which also includes the entire path). It seems highly unlikely that there would be two records which have those fields the same.
Here are my questions: 1) If I add this primary key in the table, what would happen to any duplicates that might already be present in the table?
2) How would I actually add this primary key to the table (we are using PostgreSQL 8.1.22).
3) After the primary key is added, lets say while the load script is running it tries to load a duplicate. What sort of error would PostgreSQL throw? Would I be able to catch it in the script?
4) I know you don't have much information about the load script, but given the information that I have provided do you foresee something that might need to changed in the script?
Any help is greatly appreciated. Thanks.