1

Please bear with me, this question is not going to be perfectly formed and/or may not have enough data for you to pin-point a cause. I am simply looking for ideas to continue solving this problem. Read as a horror story.

Problem Description

I have a C# program that interacts with an operator through button clicks, TCP/IP with a set of 4 barcode scanners, and some SQL. This is used in an somewhat-automated manufacturing setting. The barcode scanners come with a communications library to trigger barcode reading, as well as aggregate the data from 4 (or more) scanners into a single data stream to a client (my c# program). Each scanner provides the scanner ID as well as the scanned data, for example: 001:111111;004:444444;003:333333;002:222222.... 001, 004, 003 being the scanner's ID, while 111111, 222222, 333333, 444444 being the barcode data at those associated scanners.

I must apologize, you must be wondering why all these details but they come in play.

We run this program about 1000 times a day, mostly successfully. But at about 0.2% of the times, something unexpected happens.

Normal program flow (99.8%):
SQL Connection Open
User button press
Scanner trigger
Scanner returns data
SQL Operations (New code Registered)

Abnormal program flow (0.2%)
SQL Connection Open
User button press
Scanner trigger
Scanner returns incorrect data
SQL Operations
**Program rewinds back to start
SQL Connection Open
User button press bypassed
Scanner trigger bypassed
Scanner returns GOOD data
SQL Operations

Here is a captured sequence of events in log with bold comments:

SQL Connection Open.
K-----e Scanner LF Connect success? True
K-----e Scanner RF Connect success? True
K-----e Scanner LB Connect success? True
K-----e Scanner RB Connect success? True
New code Registered: 785889<=>819345 wrong data
New code Registered: 917890<=>481899 wrong data
New code Registered: 249447<=>999731 wrong data
New code Registered: 967082<=>386511 wrong data
New code Registered: 794079<=>772860 wrong data
New code Registered: 349467<=>421658 wrong data
New code Registered: 810132<=>525941 wrong data
New code Registered: 879309<=>105578 wrong data
SQL Connection Open. Rewind back to start of cycle, all without any user interaction
K-----e Scanner LF Connect success? True
K-----e Scanner RF Connect success? True
K-----e Scanner LB Connect success? True
K-----e Scanner RB Connect success? True
785889 is not unique. Data is good now, DB ops correctly since all scanned data was already inserted into DB
Already Exist 785889
819345 is not unique.
Already Exist 819345
917890 is not unique.
Already Exist 917890
525941 is not unique.
Already Exist 525941
249447 is not unique.
Already Exist 249447
105578 is not unique.
Already Exist 105578
967082 is not unique.
Already Exist 967082
481899 is not unique.
Already Exist 481899
794079 is not unique.
Already Exist 794079
421658 is not unique.
Already Exist 421658
349467 is not unique.
Already Exist 349467
772860 is not unique.
Already Exist 772860
810132 is not unique.
Already Exist 810132
386511 is not unique.
Already Exist 386511
879309 is not unique.
Already Exist 879309
999731 is not unique.
Already Exist 999731

Known Issues After debugging (which is difficult due to the 0.2% occurrence), the scanner communications library is implicated for the wrong (scrambled) data 001:222222;004:111111;003:222222;002:333333, etc. I am concerned that the data is bad, but I am much more concerned about the program rewind.

Question What mechanism(s) or conditions could result in repeated code execution, triggered by an external library in C# windows form? How can I detect and trap such events?

Conclusion My apologies for the long and incomplete description of this problem, I have included the information that I could gather in this question. It is certainly beyond normal to see this happen, and repeatedly. I hope to gather some information from your replies to help me further diagnose or fix this problem.

I have discussed this problem with my local scanner rep, but software libraries are provided as-is. These scanners are $10K each, but this is now a problem that I have to solve.

Lee Taylor
  • 7,761
  • 16
  • 33
  • 49
  • 2
    It's really hard to guess without seeing the actual code you are using. Could be something event related. Could be that you pass a function into an external library and this function gets executed wherever the library pleases. Could you just simulate incorrect data input to reproduce and put a breakpoint on the rewind entry point, then monitor the call stack and see what exactly happens? – bashis May 16 '22 at 17:03
  • Also please specify what does "program rewinds to the start" actually stand for. My guess is that your application is a Winforms app that uses an external library so the entry point method to the application would be `Program.Main`. Does this method restart? If not, please provide the code that actually restarts and how the library observes it. – bashis May 16 '22 at 17:10
  • Well, good logging is a life saver in such a scenario. If I were you, I refactor all my functions or methods or any interface with the external (Scanner in your case) to log every event so first I understand that this anomaly is because of scanners or my code. I believe you have to break your debugging process into smaller pieces and do it with extreme patience. – Reza Heidari May 16 '22 at 17:16
  • Mohi, logging was helpful, perhaps we could do more. We were able to use logging to identify that the data coming from the scanner SDK was scrambled. What is not explained is that the scanner SDK seems to know this and try to correct it by resetting our state machine... The state machine uses a ushort to control the execution, and that variable is not passed to the scanner SDK. – LC77StrangerThanLife May 16 '22 at 17:57
  • Please provide enough code so others can better understand or reproduce the problem. – Community May 17 '22 at 00:39

1 Answers1

1

Here's one idea: The barcode scans are probably event-driven. If the events occur too closely to each other there's no guarantee that one will complete its query before the next event triggers a different query. There's an easy way to make the thread wait for an action to complete using a basic sychronization object in System.Threading.SemaphoreSlim.

public void DoSomeSqlOperations()
{
    try
    {
        _ssEnforceSingleOperation.Wait();
        using(var cnx = new SQLite.SQLiteConnection(ConnectionString))
        {
            // Perform the query
        }
    }
    catch (Exception ex)
    {
        Debug.Assert(false, ex.Message);
    }
    finally
    {
        // Ensure the semaphore releases even if an error occurs.
        _ssEnforceSingleOperation.Release();

    }
}
SemaphoreSlim _ssEnforceSingleOperation = new SemaphoreSlim(1, 1);

If an operation is already in progress, the new one won't begin until the first one completes and releases the semaphore. My suggestion would be to protect your critical sections in this manner and see it it helps.

IVSoftware
  • 5,732
  • 2
  • 12
  • 23
  • I should note that if you happen to be using an async connection for the SQL then the semaphore also needs to be awaited asychronously. – IVSoftware May 16 '22 at 17:44