0

I'm not sure if i need to ask this question here or on https://softwareengineering.stackexchange.com/, but lets start with what i have at the moment.


Current situation

I'm maintaining a data converter that is converting record by record and a lot of them (30M+). I receive an id from the old database and after inserting i have the new primary key. These keys will be stored inside an Dictionary of some kind so when i need to find the new id i can find it. This is done by the following code (simplified example code, not the real one)

public class PersonConverter : Converter
{
    public bool Convert(int oldPersonId /*some more parameters*/)
    {
        int newPersonId;

        try
        {
            newPersonId = GetNewPersonIdForPerson(oldPersonId);
        }
        catch (SourceKeyNotFoundException)
        {
            SomeLogging("Log message");
            return true;
        }

        //lots of more thing are happening here!
        return true;
    }
}

public class Converter
{
    private Dictionary<int,int> _convPersonId;

    protected int GetNewPersonIdForPerson(int oldPersonId)
    {
        int key = oldPersonId;
        if (_convPersonId.TryGetValue(key, out int newPersonId))
            return newPersonId;
        throw new SourceKeyNotFoundException(key.ToString());
    }

    protected void SomeLogging(string message)
    {
        //Implement logging etc...
    }
}

Here in the PersonConverter we have a single call to GetNewPersonIdForPerson(oldPersonId) but in the real code there are lots of these calls to different dictionaries. My predecessor was fond of throwing exeptions as you can see. Performance wise this is not ideal, and according to Microsoft's website about Exceptions & Performance they suggest using the Tester-Doer Pattern or the Try-Parse pattern

The solution

solution 1

The solution i came up with is have the GetNewPersonIdForPerson(oldPersonId) return a int.MinValue or some other deterministic value and instead of the try/catch block use an if/else block checking for that value.

Take a loot at the new method GetNewPersonIdForPerson2(int oldPersonId)

protected int GetNewPersonIdForPerson2(int oldPersonId)
{
    int key = oldPersonId;
    if (_convPersonId.TryGetValue(key, out int newPersonId))
        return newPersonId;
    return int.MinValue;
}

and the way i call this instead of the try/catch block inside the Convert method of PersonConverter

if(GetNewPersonIdForPerson2(oldPersonId) != int.MinValue)
{
    newPersonId = GetNewPersonIdForPerson(oldPersonId); 
}
else
{
    SomeLogging("Log message");
    return true;
}

This solution has some problems because i need to call GetNewPersonIdForPerson2 twice event though i think performance wise this is quicker than throwing the exceptions.

solution 2

Another solution will be having an out variable on the GetNewPersonIdForPerson method like the following

protected bool GetNewPersonIdForPerson3(int oldPersonId, out int returnPersonId)
{
    int key = oldPersonId;
    if (_convPersonId.TryGetValue(key, out returnPersonId))
        return true;
    return false;
}

And do the following inside the Convert method of PersonConverter

if (!GetNewPersonIdForPerson3(oldPersonId, out newPersonId))
{
    SomeLogging("Log message");
    return true;
}

I haven't done anything jet to refactor this because I want some input on what is the best solution. I prefer Solution 1 because this is easier to refactor but this has 2 lookups in the same dictionary. I can't exactly tell how much of these Try/Catch blocks I have but there are lots of them. and the GetNewPersonIdForPerson method is not the only one i have (20+) don't know exaclty.

The question

Can anyone tell me what a good pattern is to fix this problem or is one the the two solutions i came up with the best one.

PS: With large conversions, 7.5M+ execeptions are throw according to the performance counter # of Exceps Thrown
PS 2: This is only some sample code and the dictionaries live forever unlike this example.

Jordy van Eijk
  • 2,718
  • 2
  • 19
  • 37

1 Answers1

1

If you’re going to use most or all of the Persons it’s probably better just to load all of the users in an initial load when you start up the application (or any other time that works well) and then do a quick look up each time you need it.

Performance wise TryGetValue is fast and you should not need to do any optimization. See the Benchmarks here: What is more efficient: Dictionary TryGetValue or ContainsKey+Item?

In a normal situation Try-Catch should not be that expensive performance wise as discussed in this thread:

In general, in today's implementations, entering a try block is not expensive at all (this was not always true). However, throwing and handling an exception is usually a relatively expensive operation. So, exceptions should normally be used for exceptional events, not normal flow control.

How often do we expect GetNewPersonIDForPerson() to fail?

With this information I would go with something similar to this:

public class PersonConverter : Converter
{
    private Dictionary<int, int> _IDs;

    public PersonConverter()
    {
        this._IDs = new Dictionary<int, int>();
    }

    public int Convert(int oldPersonID)
    {
        int newPersonID = int.MinValue;
        if (this._IDs.TryGetValue(oldPersonID, out newPersonID))
        {
            /* This oldPerson has been looked up before.
             * The TryGetValue is fast so just let's do that and return the newPersonID */
            return newPersonID;
        }
        else
        {
            try
            {
                /* This oldPerson has NOT been looked up before 
                * so we need to retrieve it from out source and update
                * the dictionary */
                int newPersonID = GetNewPersonIDForPerson(oldPersonID);
                this._IDs.Add(oldPersonID, newPersonID);
                return newPersonID;
            }
            catch (SourceKeyNotFoundException)
            {
                throw;
            }   
        }
    }
}

However, if you’re still worried about the Try-Catch statement I would probably check the GetNewPersonIDForPerson method. If it executes a query to a database it will probably return some value (0 or -1) if the record for the oldPersonID doesn’t exist and create my logic based on that data - Tester-Doer - instad of using Try-Catch.

I would also probably make Benchmark tests to see if there is any big difference with using a Try-Catch statement or not in this case. If there is a big difference in execution time I’d use the fastest one and if the difference is negligible I’d use the code that is easiest to understand and maintain.

A lot of time we try to optimize were it really isn't worth it.

Westerlund.io
  • 2,743
  • 5
  • 30
  • 37
  • I cannot lookup all up front because of the way the program is build and the data is given to me. I cannot influence the side that is giving me the information from the old system (source). for example the old system 25+ years old build in C++ without relational database etc.. Is just looping over records and giving it to me record by record. So lets say the source has 3M Orders there is a method that is called 3M times. And in each call i get the information about 1 record from the source. – Jordy van Eijk Jul 18 '18 at 11:22
  • `How often do we expect GetNewPersonIDForPerson() to fail?` For large conversions millions of times! – Jordy van Eijk Jul 18 '18 at 11:23
  • Given your information *I* can’t see another solution then to use `Try-Catch` statement if it doesn’t exist in the `_IDs` dictionary and then store the relationship between the IDs in the dictionary for faster future look-ups. – Westerlund.io Jul 18 '18 at 12:06
  • what i will do now is that when i persist the converted object to the database i add the `oldId` and the `newId` to the Dictionary. for simplicity i left this out of the example – Jordy van Eijk Jul 18 '18 at 14:19