It might be helpful to read about the span<T>
data structure.
Related Answer: What is a "span" and when should I use one?
A span<T>
is:
- A very lightweight abstraction of a contiguous sequence of values of type
T
somewhere in memory.
- Basically a
struct { T * ptr; std::size_t length; }
with a bunch of convenience methods.
- A non-owning type (i.e. a "reference-type" rather than a "value type"): It never allocates nor deallocates anything and does not keep
smart pointers alive.
I would add that if you are processing a stream of characters, you will probably want to use buffering (or perhaps more apt - "chunking") where each chunk is itself a span<char>
of fixed-size (which are all stored in a separate bit of memory) but tracked in a central array (or a more complex data structure like a double-linked-list, to facilitate quick deletion)
It would be an anti-pattern to attempt to actually maintain your entire stream of data in a single piece of contiguous physical memory (which you seem to suggest in part 3 of your request) - especially if you plan on deleting chunks of it. There should other ways to facilitate fast deletion without sacrificing performance elsewhere.
For example if you wish to delete a range of characters that falls into a given span, you can create two new spans from the start and end of the original span, excluding the deleted characters, and then replace the original span instance in your larger data structure (e.g if it were a double-linked list) with the two new smaller spans. None of this requires copying the underlying data itself, just slicing up our lightweight references to the underlying data.
If your language of choice doesn't support span, or a similar structure, check out how span is implemented.
Depending on your language of choice, it may even have built-in support for streaming spans (as .NET Core 2.1+ (2018) does).
Any additional requirements (such fast indexing to any point in your data stream, net of any deletions) can be satisfied by maintaining separate data structures that carry metadata about your spans (such as the suggested linked list). They will need updating when spans are deleted or added to, but because spans are a thin layer on top of large strings of characters, they reduce the cardinality of data structures you are maintaining by several orders of magnitude, so while you could get fancy with maintaining a variety of heaps and maps to facilitate O(1)
algorithms for every operation, you will probably find that basic structures and O(log(n))
or even O(N)
(where N
is actually N/chunk-size
) maintenance operations are feasible.