0

Due to the ~2.15 billion element limitation with the .NET Framework (even taking into account 64bit Windows, .NET 4.5+, and gcAllowVeryLargeObjects), I needed to create my own BigStringBuilder to manipulate extremely large strings.

Unfortunately, now I need to use Regex on the class. It appears that code exists to operate a simpler Regex flavour on StringBuilders, though it's apparently not well tested and only supports * (replace many chars) and ? (replace single char).

And anyway, I'm not using a StringBuilder, as said, I'm using my own BigStringBuilder class where the fundamental underlying structure is a List of char arrays (i.e.: List<char[]> c = new List<char[]>();). To retrieve any char within the giant string, a 'clever' indexer is used to access the rectangular structure:

// Indexer for class BigStringBuilder:
public char this[long n]
{
    get { return c[(int)(n / pagesize)][n % pagesize]; }
    set { c[(int)(n / pagesize)][n % pagesize] = value; }
}

It's not that 'clever' to be honest, but it does mean all the string data is potentially scattered across numerous char arrays within the List.

I am looking for the most effective way or any insights into allowing Regex (including Regex.Replace()) to work in conjunction with this BigStringBuilder class, bearing in mind strings could be much bigger than 2GB.

Dan W
  • 3,520
  • 7
  • 42
  • 69
  • Interesting problem. My guess is that the effort it would take to create such a feature which *fully* supports all the same stuff that .Net Regex does would be not worth the time, considering how unusual this problem is. I would recommend breaking up your problem into specific cases that can be dealt with individually. You can utilize things like max regex capture length and specific sections of the regex to create case-based methods. I would be interested to see some examples the sort of regexes you'll be using? – Josh Withee Feb 16 '19 at 21:06
  • @Marathon55: Yes, just basic Regex such as say, `★` representing a single wildcard char and say, `✪` representing multiple wildcard chars would be useful. Perhaps a greedy version of the latter and the 'repeat' function (Regex `+`) would be cool too, but not totally necessary. Algorithm speed/efficiency is important to me though not surprisingly. – Dan W Feb 16 '19 at 21:49
  • I ended up writing my own simplified version of Regex: https://stackoverflow.com/a/54820605/848344 – Dan W Feb 22 '19 at 05:32

0 Answers0