-1

i'm trying to parse 130,000 document , and i'm trying to do that as fast as i can.

this function is for removing the delimiter char in Document.

public static unsafe string StripRestAndNewlines(string s)
{
    int len = s.Length;
    char* newChars = stackalloc char[len];
    char* currentChar = newChars;

    for (int i = 0; i < len; ++i)
    {
        char c = s[i];
        switch (c)
        {
            case ',':
            case '.':
            case ':':
            case ';':
            case '-':
            case '>':
            case '<':
            case '/':
            case '\\':
            case '?':
            case '"':
            case '*':
            case '&':
            case '_':
            case '+':
            case '@':
            case '[':
            case ']':
            case '!':
            case '=':
            case '%':
            case '#':
                continue;
            default:
                *currentChar++ = c;
                break;
        }
    }
  return new string(newChars, 0, (int)(currentChar - newChars));            
}

but after 2 min of running the program stop and i'm getting

system.StackOverflowException

is there any delete[] of free for the allocate?

thanks!

MethodMan
  • 18,625
  • 6
  • 34
  • 52
yntnm
  • 429
  • 1
  • 4
  • 11
  • 8
    *Why* are you trying to use stack allocation for this? Presumably your document is very large... trying to allocate the whole memory required for the string on the stack seems like a bad idea to me. – Jon Skeet Dec 03 '15 at 17:15
  • i'm trying to use as less memory as possible, also is the most fast way that i find. That why i'm looking for a way to free/delete the allocate. – yntnm Dec 03 '15 at 17:18
  • if you are trying to remove chars I am wondering you wouldn't write your own little extension method to replace particular chars `public static string ReplaceCharsAt(this string input, int index, char newChar)` – MethodMan Dec 03 '15 at 17:19
  • 3
    generally using less memory is opposite of going as fast as possible. – M.kazem Akhgary Dec 03 '15 at 17:19
  • Well you're still creating a new string at the end of it - so you could just clone the origin string, and then modify a pinned version of that. – Jon Skeet Dec 03 '15 at 17:21
  • I think this [SO post](http://stackoverflow.com/questions/12190326/parsing-one-terabyte-of-text-and-efficiently-counting-the-number-of-occurrences) on the Trie data structure may help you. – Clay Ver Valen Dec 03 '15 at 18:09

2 Answers2

3

is there any delete[] of free for the allocate?

Yes, doing nothing. Since its stack-allocated it will be immediately deleted as soon as the method returns (your mentioning delete[] suggests you are drawing analogy to C++, but note that in C++ you don't delete[] stack-allocated variables).

You won't get that far though, because you are stack-allocating too much.

stackalloc is of very limited use. It tends to be slower than just using heap memory unless you are using it as an alternative to fixed or in a few situations where different threads are allow allocating large arrays at the same time. It's only appropriate for use arrays smaller than a few kilobytes at the outside.

You're going to be better off using a heap array. You may or may not be better off using pointers and fixed.

You'd be much, much better off parsing the document(s) in chunks. If at all possible load them in from streams only in moderate-sized segments of 4kiB or 8kiB and process each such chunk as it comes.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
2

There is no delete method I think. It's a stack, you can add something on top, and remove things from top. You cannot remove things from middle of stack. Allocated memory is automatically freed when method returns. I think stackoverflow occurs when incoming string is very long. Use heap memory to this task. Just create a new array.

apocalypse
  • 5,764
  • 9
  • 47
  • 95
  • still heap is getting me slower but good to know that i cant thanks. – yntnm Dec 03 '15 at 17:25
  • then maybe create a static array? you will avoid instance creation every method call. If incoming string will be too big, you will resize that array. – apocalypse Dec 03 '15 at 17:28