- Use one of the existing CSV parsers
- Parse each row to a mapped class object
- Override
Equals
and GetHashCode
for your object
- Keep a
List<T>
or HashSet<T>
in memory, At the first step initialize them with no contents.
- On reading each line from the CSV file, check if the exist in your in-memory collection (List, HashSet)
- If the object doesn't exists in your in-memory collection, add it to the collection and insert in database.
- If the object exists in your in-memory collection then ignore it (Checking for it would be based on Equals and GetHashCode implementation and then it would be as simple as
if(inMemoryCollection.Contains(currentRowObject))
I guess you have a windows service reading CSV files periodically from a file location. You can repeat the above process, every time you read a new CSV file. This way you will be able to maintain an in-memory collection of the previously inserted objects and ignore them, irrespective of their place in the CSV file.
If you have primary key, defined for your data then you can use Dictionary<T,T>
, where you Key could be the unique field. This will help you in having more performance for comparison and you can ignore Equals
and GetHashCode
implementation.
As a backup to this process, your DB writing routine/stored procedure should be defined in a way that it would first check, if the record already exists in the table, in that case Update the table otherwise INSERT new record. This would be UPSERT
.
Remember, if you end up maintaining an in-memory collection, then keep clearing it periodically, otherwise you could end up with out of memory exception.