1

I'm trying to parse a large text string. I need to split the original string in blocks of 15 characters(and the next block might contain white spaces, so the trim function is used). I'm using two strings, the original and a temporary one. This temp string is used to store each 15 length block. I wonder if I could fall into a performance issue because strings are immutable. This is the code:

string original = "THIS IS SUPPOSE TO BE A LONG STRING AN I NEED TO SPLIT IT IN BLOCKS OF 15 CHARACTERS.SO";
string temp = string.Empty;
while (original.Length != 0)
{
   temp = original.Substring(0, 14).Trim();

   original = original.Substring(14, (original.Length -14)).Trim();
}

I appreciate your feedback in order to find a best way to achieve this functionality.

Michael Hidalgo
  • 197
  • 3
  • 13

4 Answers4

3

You'll get slightly better performance like this (but whether the performance gain will be significant is another matter entirely):

for (var startIndex = 0; startIndex < original.Length; startIndex += 15)
{
    temp = original.Substring(startIndex, Math.Min(original.Length - startIndex, 15)).Trim();
}

This performs better because you're not copying the last all-but-15-characters of the original string with each loop iteration.

EDIT

To advance the index to the next non-whitespace character, you can do something like this:

for (var startIndex = 0; startIndex < original.Length; )
{
    if (char.IsWhiteSpace(string, startIndex)
    {
        startIndex++;
        continue;
    }
    temp = original.Substring(startIndex, Math.Min(original.Length - startIndex, 15)).Trim();
    startIndex += 15;
}
phoog
  • 42,068
  • 6
  • 79
  • 117
  • this code is very clear and easy to read. Thanks for the suggestion. – Michael Hidalgo Nov 25 '11 at 17:44
  • The idea is to get blocks of 15 characters(without white spaces). In this code, if there were 5 white spaces at the beginning, the result string will be 10 length. – Michael Hidalgo Nov 25 '11 at 18:09
  • @MichaelHidalgo your sample code would do the same, as far as I can tell. Also, before I edit my answer to meet this requirement, let me ask how to handle a case like "XXXX XXXX XXXX XXXX XXXX"? The first block would be 14 characters long ("XXXX XXXX XXXX"). Is that not acceptable? – phoog Nov 25 '11 at 18:35
  • thanks for your response. The main idea is to take 15 characters after applying the trim. In your example, you apply the trim after considering the 15 characters(including white spaces) and based on my understanding, the result string would be less than 15 characters. A possible solution would be applying the Trim() before considering 15 characters. But I can figure it out. Thanks for your help.I'm facing a scenario and there are 8 white spaces at the beginning of the second iteration. – Michael Hidalgo Nov 25 '11 at 18:40
  • Hey Thanks, char.IsWhiteSpace() function is the key!!. – Michael Hidalgo Nov 25 '11 at 20:56
1

I think you are right about the immutable issue - recreating 'original' each time is probably not the fastest way.

How about passing 'original' into a StringReader class?

Neil Thompson
  • 6,356
  • 2
  • 30
  • 53
1

If your original string is longer than few thousand chars, you'll have noticable (>0.1s) processing time and a lot of GC pressure. First Substring call is fine and I don't think you can avoid it unless you go deep inside System.String and mess around with m_FirstChar. Second Substring can be avoided completely when going char-by-char and iterating over int.

MagnatLU
  • 5,967
  • 1
  • 22
  • 17
0

In general, if you would run this on bigger data such code might be problematic, it of course depends on your needs.

In general, it might be a good idea to use StringBuilder class, which will allow you to operator on strings in "more mutable" way without performance hit, like remove from it's beggining without reallocating whole string.

In your example however I would consider throwing out lime that takes substring from original and substitute it with some code that would update some indexes pointing where you should get new substring from. Then while condition would be just checking if your index as at the end of the string and your temp method would take substring not from 0 to 14 but from i, where i would be this index.

However - don't optimize code if you don't have to, I'm assuming here that you need more performance and you want to sacrifice some time and/or write a bit less understandable code for more efficiency.

Marcin Deptuła
  • 11,789
  • 2
  • 33
  • 41