In the project, I need to call an external API based on time. So, for one day, I may need to call the API 24 times, one call for one hour period. The API result is a XML file which has 6 fields. I will need to insert these data into a table. Averagely, for each hour, it has about 20,000 rows data.
The table has these 6 columns:
col1, col2, col3, col4, col5, col6
When all 6 columns are the same, we consider the rows are the same, and we should not insert duplications.
I'm using C# and Entity Framework for this:
foreach (XmlNode node in nodes)
{
try
{
count++;
CallData data = new CallData();
...
// get all data and set in 'data'
// check whether in database already
var q = ctx.CallDatas.Where(x => x.col1 == data.col1
&& x.col2 == data.col2
&& x.col3 == data.col3
&& x.col4 == data.col4
&& x.col5 == data.col5
&& x.col6 == data.col6
).Any();
if (q)
{
// exists in database, skip
// log info
}
else
{
string key = $"{data.col1}|{data.col2}|{data.col3}|{data.col4}|{data.col5}|{data.col6}";
// check whether in current chunk already
if (dic.ContainsKey(key))
{
// in current chunk, skip
// log info
}
else
{
// insert
ctx.CallDatas.Add(data);
// update dic
dic.Add(key, true);
}
}
}
catch (Exception ex)
{
// log error
}
}
Logger.InfoFormat("Saving changes ...");
if (ctx.ChangeTracker.HasChanges())
{
await ctx.SaveChangesAsync();
}
Logger.InfoFormat("Saving changes ... Done.");
The code works fine. However, we will need to use this code to run for past several months. The issue is: the code runs slow since for each row it will need to check whether it exists already.
Is there any suggestions to improve the performance?
Thanks