0

I am developing a windows form application that takes a Robot Program generated by other software and modifies it. The process of modification is as follows:

  1. The StreamReader.ReadLine() is used parse the file line by line
  2. Regex is used to search for specific keywords in the file. If a match is obtained, the matched string is copied to another string and is replaced with new lines of robot code.
  3. The modified code is saved in a string and is finally written to a new file.

  4. All the collection of matched strings obtained using Regex is also saved in a string and is finally written to a new file.

I have been able to successfully do this

    private void Form1_Load(object sender, EventArgs e)
    {
        string NextLine = null;
        string CurrLine = null;
        string MoveL_Pos_Data = null;
        string MoveL_Ref_Data = null;
        string MoveLFull = null;
        string ModCode = null;
        string TAB = "\t";
        string NewLine = "\r\n";
        string SavePath = null;
        string ExtCode_1 = null;
        string ExtCode_2 = null;
        string ExtCallMod = null;

        int MatchCount = 0;
        int NumRoutines = 0;

        try
        {
            // Ask user location of the source file
            // Displays an OpenFileDialog so the user can select a Cursor.  
            OpenFileDialog openFileDialog1 = new OpenFileDialog
            {
                Filter = "MOD Files|*.mod",
                Title = "Select an ABB RAPID MOD File"
            };

            // Show the Dialog.  
            // If the user clicked OK in the dialog and  
            // a .MOD file was selected, open it.  
            if (openFileDialog1.ShowDialog() == System.Windows.Forms.DialogResult.OK)
            {
                // Assign the cursor in the Stream to the Form's Cursor property.  
                //this.Cursor = new Cursor(openFileDialog1.OpenFile());
                using (StreamReader sr = new StreamReader(openFileDialog1.FileName))
                {
                    // define a regular expression to search for extr calls 
                    Regex Extr_Ex = new Regex(@"\bExtr\(-?\d*.\d*\);", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Multiline);
                    Regex MoveL_Ex = new Regex(@"\bMoveL\s+(.*)(z\d.*)", RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Multiline);

                    Match MoveLString = null;

                    while (sr.Peek() >= 0)
                    {
                        CurrLine = sr.ReadLine();
                        //Console.WriteLine(sr.ReadLine());

                        // check if the line is a match 
                        if (Extr_Ex.IsMatch(CurrLine))
                        {
                            // Keep a count for total matches
                            MatchCount++;

                            // Save extr calls in a string
                            ExtCode_1 += NewLine + TAB + TAB + Extr_Ex.Match(CurrLine).ToString();


                            // Read next line (always a MoveL) to get Pos data for TriggL
                            NextLine = sr.ReadLine();
                            //Console.WriteLine(NextLine);

                            if (MoveL_Ex.IsMatch(NextLine))
                            {
                                // Next Line contains MoveL
                                // get matched string 
                                MoveLString = MoveL_Ex.Match(NextLine);
                                GroupCollection group = MoveLString.Groups;
                                MoveL_Pos_Data = group[1].Value.ToString();
                                MoveL_Ref_Data = group[2].Value.ToString();
                                MoveLFull = MoveL_Pos_Data + MoveL_Ref_Data;                                

                            }

                            // replace Extr with follwing commands
                            ModCode += NewLine + TAB + TAB + "TriggL " + MoveL_Pos_Data + "extr," + MoveL_Ref_Data;
                            ModCode += NewLine + TAB + TAB + "WaitDI DI1_1,1;";
                            ModCode += NewLine + TAB + TAB + "MoveL " + MoveLFull;
                            ModCode += NewLine + TAB + TAB + "Reset DO1_1;";
                            //break;

                        }
                        else
                        {
                            // No extr Match
                            ModCode += "\r\n" + CurrLine;
                        }                     

                    }

                    Console.WriteLine($"Total Matches: {MatchCount}");
                }


            }

            // Write modified code into a new output file
            string SaveDirectoryPath = Path.GetDirectoryName(openFileDialog1.FileName);
            string ModName = Path.GetFileNameWithoutExtension(openFileDialog1.FileName);
            SavePath = SaveDirectoryPath + @"\" + ModName + "_rev.mod";
            File.WriteAllText(SavePath, ModCode);

            //Write Extr matches into new output file 
            //Prepare module
            ExtCallMod = "MODULE ExtruderCalls";

            // All extr calls in one routine
            //Prepare routines
            ExtCallMod += NewLine + NewLine + TAB + "PROC Prg_ExtCall"; // + 1;
                ExtCallMod += ExtCode_1;
                ExtCallMod += NewLine + NewLine + TAB + "ENDPROC";
                ExtCallMod += NewLine + NewLine;

            //}

            ExtCallMod += "ENDMODULE";

            // Write to file
            string ExtCallSavePath = SaveDirectoryPath + @"\ExtrCalls.mod";                
            File.WriteAllText(ExtCallSavePath, ExtCallMod);                

        }

        catch (Exception ex)
        {
            Console.WriteLine(ex.ToString());                
        }

    }                    
}

While this helps me achieve what I want, the process is very slow. Since I am new to C# programming, I suspect that the slowness is coming from duplicating the original file contents to a string and NOT replacing content in place (I am not sure if contents in original file can be directly replaced). For an input file of 20,000 rows, the whole process is taking a little over 5 minutes.

I used to get the following error: Message=Managed Debugging Assistant 'ContextSwitchDeadlock' : 'The CLR has been unable to transition from COM context 0xb27138 to COM context 0xb27080 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations.'

I was able to get past it by disabling 'ContextSwitchDeadlock' settings in debugger settings. This may not be the best practice.

Can anyone help me in improving the performance of my code?

EDIT: I found out that the robot controller had limitations on the number of Lines to be in the MOD file (output file). The maximum number of lines allowed was 32768. I came up with a logic to split the contents of string builder to separate output files as follows:

// Split modCodeBuilder into seperate strings based on final size
        const int maxSize = 32500;
        string result = modCodeBuilder.ToString();
        string[] splitResult = result.Split(new string[] { "\r\n" }, StringSplitOptions.None);
        string[] splitModCode = new string[maxSize]; 

        // Setup destination directory to be same as source directory
        string destDir = Path.GetDirectoryName(fileNames[0]);

        for (int count = 0; ; count++)
        {
            // Get the next batch of text by skipping the amount
            // we've taken so far and then taking the maxSize.
            string modName = $"PrgMOD_{count + 1}";
            string procName = $"Prg_{count + 1}()";

            // Use Array Copy to extract first 32500 lines from modCode[]
            int src_start_index = count * maxSize;
            int srcUpperLimit = splitResult.GetUpperBound(0);
            int dataLength = maxSize;

            if (src_start_index > srcUpperLimit) break; // Exit loop when there's no text left to take

            if (src_start_index > 1)
            {
                // Make sure calculate right length so that src index is not exceeded
                dataLength = srcUpperLimit - maxSize;
            }                

            Array.Copy(splitResult, src_start_index, splitModCode, 0, dataLength);
            string finalModCode = String.Join("\r\n", splitModCode);

            string batch = String.Concat("MODULE ", modName, "\r\n\r\n\tPROC ", procName, "\r\n", finalModCode, "\r\n\r\n\tENDPROC\r\n\r\nENDMODULE");

            //if (batch.Length == 0) break; 

            // Generate file name based on count
            string fileName = $"ABB_R3DP_{count + 1}.mod";

            // Write our file text
            File.WriteAllText(Path.Combine(destDir, fileName), batch);

            // Write status to output textbox
            TxtOutput.AppendText("\r\n");
            TxtOutput.AppendText("\r\n");
            TxtOutput.AppendText($"Modified MOD File: {fileName} is generated sucessfully! It is saved to location: {Path.Combine(destDir, fileName)}");
        }
Rock
  • 95
  • 1
  • 1
  • 9
  • @Gauravsa Can you explain why these lines are bottleneck and how they can be improved? Your answers doesn't answer my question as is. – Rock Feb 18 '19 at 02:50
  • String is immutable. every time one makes a change to a string, you are effectively creating a new string, allocating memory, copying data from existing string to new string.. – Gauravsa Feb 18 '19 at 02:53
  • here is a good link for read: http://jonskeet.uk/csharp/stringbuilder.html – Gauravsa Feb 18 '19 at 02:53
  • Personally, I'd use two threads for writing and one thread for reading, so that the file can be written concurrently as it is being read, second you can just find which process is the bottleneck by printing the `number of ticks` taken for a set of steps... also concentrate on regex matching – kowsikbabu Feb 18 '19 at 04:57
  • Avoid using `IsMatch` when you're planning on using the results. Directly use `Matches`, so you don't double the regex execution. Profile your specific regex expressions with `Compiled` - extremely simple expressions can actually run faster without it. Use `StringBuilder` – Sten Petrov Feb 18 '19 at 20:13

1 Answers1

0

It's possible that the string concatenations are taking a long time. Using a StringBuilder instead may improve your performance:

private static void GenerateNewFile(string sourceFullPath)
{
    string posData = null;
    string refData = null;
    string fullData = null;

    var modCodeBuilder = new StringBuilder();
    var extCodeBuilder = new StringBuilder();

    var extrRegex = new Regex(@"\bExtr\(-?\d*.\d*\);", RegexOptions.Compiled | 
        RegexOptions.IgnoreCase | RegexOptions.Multiline);

    var moveLRegex = new Regex(@"\bMoveL\s+(.*)(z\d.*)", RegexOptions.Compiled | 
        RegexOptions.IgnoreCase | RegexOptions.Multiline);

    int matchCount = 0;
    bool appendModCodeNext = false;

    foreach (var line in File.ReadLines(sourceFullPath))
    {
        if (appendModCodeNext)
        {
            if (moveLRegex.IsMatch(line))
            {
                GroupCollection group = moveLRegex.Match(line).Groups;

                if (group.Count > 2)
                {
                    posData = group[1].Value;
                    refData = group[2].Value;
                    fullData = posData + refData;
                }
            }

            modCodeBuilder.Append("\t\tTriggL ").Append(posData).Append("extr,")
                .Append(refData).Append("\r\n\t\tWaitDI DI1_1,1;\r\n\t\tMoveL ")
                .Append(fullData).AppendLine("\r\n\t\tReset DO1_1;");

            appendModCodeNext = false;
        }
        else if (extrRegex.IsMatch(line))
        {
            matchCount++;
            extCodeBuilder.Append("\t\t").AppendLine(extrRegex.Match(line).ToString());
            appendModCodeNext = true;
        }
        else
        {
            modCodeBuilder.AppendLine(line);
        }
    }

    Console.WriteLine($"Total Matches: {matchCount}");

    string destDir = Path.GetDirectoryName(sourceFullPath);
    var savePath = Path.Combine(destDir, Path.GetFileNameWithoutExtension(sourceFullPath), 
        "_rev.mod");

    File.WriteAllText(savePath, modCodeBuilder.ToString());

    var extCallMod = string.Concat("MODULE ExtruderCalls\r\n\r\n\tPROC Prg_ExtCall",
        extCodeBuilder.ToString(), "\r\n\r\n\tENDPROC\r\n\r\nENDMODULE");

    File.WriteAllText(Path.Combine(destDir, "ExtrCalls.mod"), extCallMod);
}

You mentioned in the comments that you want to take batches of the text and write them to separate files. One way to do this would be to treat the string as a char[], and then use the System.Linq extension methods, Skip and Take. Skip will skip a certain amount of characters in a string, and then Take will take a certain amount of characters and return them in an IEnumerabe<char>. We can then use string.Concat to convert this to a string and write it to a file.

If we have a constant that represents our max size, and a counter that starts at 0, we can use a for loop that increments counter and which skips counter * max characters, and then takes max characters from the string. We can also use the counter variable to create the file name, since it will increment on each iteration:

const int maxSize = 32500;
string result = modCodeBuilder.ToString();

for (int count = 0;; count++)
{
    // Get the next batch of text by skipping the amount
    // we've taken so far and then taking the maxSize.
    string batch = string.Concat(result.Skip(count * maxSize).Take(maxSize));

    if (batch.Length == 0) break; // Exit loop when there's no text left to take

    // Generate file name based on count
    string fileName = $"filename_{count + 1}.mod";

    // Write our file text
    File.WriteAllText(Path.Combine(destDir, fileName), batch);
}

Another way to do this that might be faster is to use string.Substring, and use count * maxSize as the start index of the substring to take. Then we just need to make sure our length doesn't exceed the bounds of the string, and write the substring to the file:

for (int count = 0;; count++)
{
    // Get the bounds for the substring (startIndex and length)
    var startIndex = count * maxSize;
    var length = Math.Min(result.Length - startIndex, maxSize);

    if (length < 1) break; // Exit loop when there's no text left to take

    // Get the substring and file name
    var batch = result.Substring(startIndex, length);
    string fileName = $"filename_{count + 1}.mod";

    // Write our file text  
    File.WriteAllText(Path.Combine(destDir, fileName), batch);
}

Note that this will split the text into blocks of exactly 32500 characters (except the last block). If you want to take only whole lines, that requires a bit more work but is still not hard to do.

Rufus L
  • 36,127
  • 5
  • 30
  • 43
  • +1 !!! Wow! The performance improvement is drastic. The whole generation of files takes less than 300 ms (down from > 5 minutes!). `modCodeBuilder.AppendLine(string.Concat("\t\tTriggL ", posData, "extr,", refData, "\r\n\t\tWaitDI DI1_1,1;\r\n\t\tMoveL ", fullData, "\r\n\t\tReset DO1_1;"));` makes a lot of difference. In an earlier solution, I tried using StringBuilder something like this: `StringBuilder modCodeBuilder = new StringBuilder()` and used `modCodeBuilder.Append()` method inplace of `+=`. This made performance even worse. Upvoted – Rock Feb 18 '19 at 07:11
  • Is there a performance difference between `StringBuilder.Append` and `StringBuilder.AppendLine` ? Also is there a difference between `+=` and `String.Concat()`? – Rock Feb 18 '19 at 07:13
  • `AppendLine` just makes two calls to `Append` - one for the string and another to append the newline characters. Otherwise they're the same. You can see the source code [here](https://referencesource.microsoft.com/#mscorlib/system/text/stringbuilder.cs,73bc75596acbac77). – Rufus L Feb 18 '19 at 08:20
  • The `+=` and `+` for strings get compiled into `string.Concat` calls, so there should be no difference. See the answer [here](https://stackoverflow.com/questions/10341188/string-concatenation-using-operator). For concatenating more than 7 strings (I think? something like that), `StringBuilder` will be more efficient. Every time you add a string to another, both their lengths are examined, memory is allocated, and a new string is created. `StringBuilder` instead allocates a memory block cache ahead of time and then uses that to store the string. – Rufus L Feb 18 '19 at 08:28
  • If you had other code using `StringBuilder` and it was giving worse performance, it must have not been written correctly. – Rufus L Feb 18 '19 at 08:29
  • Thanks for the clarification. Is there any difference between `StreamReader.ReadLine()` vs `File.ReadLines()` methods? – Rock Feb 18 '19 at 13:23
  • I don't think so (at least performance wise). The File class method is a wrapper around the StreamWriter class method. Source code is [here](https://referencesource.microsoft.com/mscorlib/a.html#d989485a49fbbfd2) – Rufus L Feb 18 '19 at 14:32
  • Thank you again. I have a question: What is the best way to split `modCodeBuilder` contents? There is a limit on the number of lines in the output file (32768 to be exact). The limit is from the robot controller. The number of lines in `modCodeBuilder.ToString()` will be usually greater than that number. What is the best way to take the first 32500 lines from `modCodeBuilder.ToString()` and save it in a file (filename_1.mod) and the next 32500 to filename_2.mod? Filenames are generated automatically depending on the number of splits required. – Rock Feb 18 '19 at 19:13
  • You can use `Skip` and `Take` to get batches of text from the string, and then save each batch to a different file. I updated the answer with a sample (at the bottom). – Rufus L Feb 18 '19 at 19:41
  • `string.Concat` and interpolation $"...{...}" are counter-productive when using StringBuilder - append the parts separately – Sten Petrov Feb 18 '19 at 20:14
  • Thank you for the suggestion. To split wholly along lines, I tried `string[] modCode = result.Split('\n');`. And instead of passing the original `result` string to `String.Concat()`, I passed string array `modCode` like this: `string batch = string.Concat("MODULE ", modName, "\r\n\r\n\tPROC", procName, "\r\n", modCode.Skip(count * maxSize).Take(maxSize), "\r\n\r\n\tENDPROC\r\n\r\nENDMODULE");`. Extra arguments are headers for new files. But, the content of output file has enumertor object: `System.Linq.Enumerable+d__25`1[System.String]` instead of actual content. – Rock Feb 18 '19 at 20:25
  • As you mentioned, the above methods take 32500 **characters** instead of whole lines. Can `string [] modCode = result.Split('\n')` be used as input to `String.Concat()` instead of `result` string? – Rock Feb 18 '19 at 20:50
  • If you only split on `\n` you will have a bunch of orphaned `\r` characters. – Rufus L Feb 18 '19 at 21:05
  • @RufusL now skip the `IsMatch` by assigning the result of `Matches` to a var and checking if it's got any entries, so the regex is not matched twice and you've got my +1 – Sten Petrov Feb 18 '19 at 21:28
  • @RufusL I have edited my original question to include a solution for taking Whole lines instead of characters. It works as expected. I wanted your opinion or any suggestions for improvement. Your answer for splitting stringbuilder contents returned a system object in the output file (like this `System.Linq.Enumerable+d__251[System.String]` ) and not the actual strings (contents) – Rock Feb 19 '19 at 00:40
  • Do you mean my answer returned the object, or your implementation (in the comment above) did that? Because your implementation is missing a string.concat around the `modCode.Skip(count * maxSize).Take(maxSize)` argument, (`Take` returns an `IEnumerable`) – Rufus L Feb 19 '19 at 00:57
  • I observed this both in your answer and my method (same as your answer) in the comment above. Even your answer above which has `String.Concat()` around Take method did that. Not sure what caused that. In my answer. I used a slightly different approach (updated in an edit to my original question). Thanks for your help! – Rock Feb 19 '19 at 01:18