6

I have a string that "may" be longer than any simple int boundaries.

Currently, the string.substring function takes only int parameters as index and length which is not enough for me since i need long for the parameter value types.

Do you know any implementation of long substring function?

Or what do you recommend I do to solve this possible finding substring problem with very long string?

Thank you.

Gloomdo
  • 63
  • 1
  • 3
  • 6
    A string that long (>2G chars) would take up >4GB memory. Are you sure that the substring function is going to be your only problem? – Jon Mar 12 '11 at 12:40
  • 1
    How long is the string? And what about the substring? You probably don't want to load the entire string into memory at once, but use a file stream to read portions of the file while searching for the substring? – Klaus Byskov Pedersen Mar 12 '11 at 12:40
  • The possible string matching process will probably only take place on memory and amount of available memory for the machine is supposedly very large. The strings that this operation will work on is, whole string is around 10G chars and substring is 1000 chars or so. I agree that any string matching operation on that large strings would be algorithmically idiotic in addition to resource requirements. – Gloomdo Mar 12 '11 at 12:44

4 Answers4

7

I have a string that "may" be longer than any simple int boundaries.

No, in .NET you won't have that problem. The System.String class itself uses Int32 indexing and Length properties everywhere.

Maybe you will have a (char) array that's over 2GB but that is taken care of, you can use 'long` indexing.

Related question: What is the maximum possible length of a .NET string?

Community
  • 1
  • 1
H H
  • 263,252
  • 30
  • 330
  • 514
  • 2
    I like the comment _This is one of those situations where "If you have to ask, you're probably doing something wrong."_ – Gloomdo Mar 12 '11 at 12:55
  • not even arrays... See here: http://stackoverflow.com/questions/573692/is-the-size-of-an-array-constrained-by-the-upper-limit-of-int-2147483647/573701#573701 and http://stackoverflow.com/questions/1087982/single-objects-still-limited-to-2-gb-in-size-in-clr-4-0/1088044#1088044 – xanatos Mar 12 '11 at 13:25
  • @xanatos: You're right, I was thinking about that LongLength property but that's not indexing. – H H Mar 12 '11 at 13:42
0

As the answer in the link Henk provides states, you cannot create an object of size greater than 2GB in .NET (64-bit also has this restriction).

Therefore you can't have a string that big no matter what. You will need to use some sort of streaming algorithm to find and isolate the data you are interested in.

Jon
  • 428,835
  • 81
  • 738
  • 806
0

As Henk Holterman said System.String uses int32 ....

But if needed to, use unsigned int which can go to 4.300 million: try uint.

uint stringLength =  4,294,967,295

though it doesnt go that much higher than the normal int

int -> -2,147,483,648 to 2,147,483,647 uint -> 0 to 4,294,967,295

Reza M.
  • 1,205
  • 14
  • 33
0

Also, conventional substring algorithms might not work well on that scale (actually I do not know how .Substring works). You might want to take a look at this.

SWeko
  • 30,434
  • 10
  • 71
  • 106