-2


I have a file with some data in it. Now i want to add some content but not by appending it. More like "adding this block of 4 bytes between the current 10th and 11th byte in this file". Currently I'm using FileStream to to read and write from files.
So my question: is there a way to insert this data without rewriting the entire file?
Thank you,
Nils.

BDevGW
  • 347
  • 3
  • 15

1 Answers1

1

Edit 2 - the rewrite

After a lot of comments, I figured out the real issue is that you have a database that mostly works like a file System. The biggest difference is propably that the clusters know the file they belong to, rather then the other way around. I am going to use Filesystem terminology for the DDL/Shema. Sorry I can not get proper SQL Syntax highlighting to work.

CREATE TABLE Files(
  ID INTEGER PRIMARY KEY
  /* a bunch of other columns that do not really mater for this */
);

CREATE TABLE Clusters(
  ID INTEGER PRIMARY KEY,
  FK_FileID INTEGER FOREIGN KEY (Files.ID), -- Special, see text
  ClusterNumber INTEGER, --Special, see text
  Contents --whatver type you need
);

Clusters is a odd table in many regardes:

  • the pirmary key is most irrelevant. Indeed you can propably remove indexing for it. The only reasons I have it are a) habit, b) becaue you might regret lacking it and c) it might be usefull for management work
  • ClusterNumber is the "N-th Cluster for FK_FileID"
  • ClusterNumber and FK_FileID should have a shared unique constraint (the combination of both must be unique) and should propably be on a index covering both. Think of them as if they were a Composite Primary key or multirow surrogate key (wich does sound like a oxymoron). You will use those way more often then the official PK.

You would get all the clusters for a File like this:

SELECT Content FROM Clusters 
ORDER BY ClusterNumber 
WHERE FK_FileID = /*The file whose whole data you want*/

Wich would be nice covered by that extra Index.

If you want to shove in a segment anywhere in this you would:

  1. Move all the following segments ClusterNumber 1 up
  2. Just add a Cluster entry with that newly freed-up ClusterNumber for this file
  3. You can be somewhat wastefull with that last step, like only adding a 4 letter cluster.

Asuming you store this on a HDD/Rotating Disk Storage you will propably not get around defragmenting this regulary anyway, so you might as well consolidate the clusters to cut the waste then while doing so. Unless you can somehow teach the DBMS to properly read this (you need to ask DB experts for this), you want to have all clusters of a file in sequence in the DB as much as possible, so it will be together on the disk as well. Of course is the physical medium is a proper SSD, you can skip the defragmentation and only consolidate.

Advanced options include stuff like reserving expansion room for a file (that no other file will be using) ahead of time. So the clusters can be kept together (even if not in order) and the need for defragmentation is less.

Christopher
  • 9,634
  • 2
  • 17
  • 31
  • So not what I hoped but better then nothing :). Is it enough to set the stream position to the 10th byte and trart overriding it? – BDevGW Nov 07 '19 at 23:37
  • @BDevGW IIRC, most FileStream classes have a insert mode. The caching might delay any actually work until the Stream is closed, but it can not be guaranteed. – Christopher Nov 07 '19 at 23:40
  • @BDevGW: I could not find the Insert mode switch. So I gave you my save "modify file" code in the post. Word Processors use something very similar to this pattern. – Christopher Nov 07 '19 at 23:52
  • so yes, go to position and override from there. – BDevGW Nov 08 '19 at 00:01
  • @BDevGW The thing is that override/inserts are hella dangerous. Ignoring the performance (wich is not really something you can avoid), the time it takes gives plenty of time for Fubar to strike and cut the power or crash the programm halfway through the rewrite. That is why you first make a file with the modified content. Then put the new file into old-files place. | Of course it would still be better to turn this file into a proper DB table already. With those you got a DBMS dealing with all those hassles. – Christopher Nov 08 '19 at 00:07
  • but what if it's not a database? What if I have a letter wher eI replace a word or a world where I change a block? I don't understand what you mean with "proper DB table". – BDevGW Nov 08 '19 at 00:11
  • @BDevGW If it is a letter it will be: A) So trivially small, having to rewrite it does not mater. B) Has word processors that do exactly what I showed you to make certain their edit does not break the file. – Christopher Nov 08 '19 at 00:13
  • Ok in my case I have a database but I don't know how large the fields will be so I can't simply "override field xy". After a change the data could be grown or decreased by a few bytes. But if I limit it it would take a lot more space then necessary. – BDevGW Nov 08 '19 at 00:17
  • And the second type of information can't be predicted, all what can be said is that its a multiple of four. – BDevGW Nov 08 '19 at 00:18
  • @BDevGW. Ah, okay. You have the BLOBS stored as VARBINARY or similar in the database. How big are those documents? Could you fully load them into memory to do that Edit in memory, then write the whole thing back to the DB? – Christopher Nov 08 '19 at 00:19
  • So I'm working on something that has "cluster" like an NTFS system. But the information which information are store in what clusters are stored seperatly. This includes the name of the content and a "list" of clusters where the information are. So for example I have the information "address" stored in the clusters 2 two and three but if it gets larger I have to add another cluster to the list or if it get renamed I have to change the name and both can or will result in change of the size of the information (within the database). (more will follow in the next comment) – BDevGW Nov 08 '19 at 00:23
  • There is no "big" problem to cap the length of the name to 256 bytes but this means that "test" and "this is a large name" will always take 256 bytes. And giving a set anmount of clusters will limit all information extremely or will make any information take up a lot of unused space in my registry. The information themself are already stored in set units so in the worest case only the information that is currently saved will get lost on a crash. – BDevGW Nov 08 '19 at 00:26
  • @BDevGW: "So I'm working on something that has "cluster" like an NTFS system. But the information which information are store in what clusters are stored seperatly." That is not like a FileSystem. That is literally a FileSystem. They stored some extra stuff like "Paths", "Filenames" and "Atributes" with the content list. Later they added rights to that. But otherwise it is all that FileSystems ever did since FAT days: https://en.wikipedia.org/wiki/File_Allocation_Table#Concepts | NTFS still does the same thing, just a bit more database like. – Christopher Nov 08 '19 at 00:36
  • I said "like" because it's not for storing files but other information (which can be realy small like a name or id or address). But what's your suggestion to handle the registry? I have no good idea then making the registry entries in variable sizes and rewriting it completely on every change that changes the length of data. – BDevGW Nov 08 '19 at 00:40
  • @BDevGW: "There is no "big" problem to cap the length of the name to 256 bytes but this means that "test" and "this is a large name" will always take 256 bytes" Filesystems have this exact issue. When formating you have 3 choices: 1. Pick the minimum cluster size, to avoid waste. 2. Pick the maximum cluster size, to improove search/access performance and minimize fragmentation. 3. Keep at the default value wich tries to manage both. – Christopher Nov 08 '19 at 00:42
  • I mean what can you suggest in relation to the clusterallocation in the registry. Any other idea then making this part variable without exploding the registry size or limiting the anmount of data per information? – BDevGW Nov 08 '19 at 00:53
  • @BDevGW: If we are going to talk about this, I need to write down a basic shema. I think you should re-ask this question giving the real situation, with "database", "filesystems" and [specific DB you use] tags. C# and files do not really have to do anything with the problem. – Christopher Nov 08 '19 at 05:45
  • @BDevGW Okay, I updated my answer. – Christopher Nov 08 '19 at 06:08
  • Ok thank you really much but I think we talk into different directions. I don't have any SQL (like) database. I have my own little system. I'm only a hobby developer and do this only to improve my skills (here it's FileStream what I try to improve) my system is build on two files. One is the database where the data is stored in chunks of X bytes (the cluster). It will grow with more information and if something is deleted only the reference will be removed until new data overrides this. – BDevGW Nov 08 '19 at 11:59
  • If nothing is left it will grow by the required anmount of clusters and on cleanup all clusters from data will be pit together and unused clusters will be removed to reduce the size again. the other one is like a list. "Information A in cluster 1, 2, 3", information B in cluster 4,5"... – BDevGW Nov 08 '19 at 12:00
  • And here is the only problem I have: how do I store this list most efficient. I have two options. I can make it dynamically, this way I have to rewrite the file on each change or I can limit the anmount of bytes per name and cluster per information and give each information a set anmount of space like "test1\0\0\0..." And cluster : 01, 02, 03, -1, -1, -1". Here I have to simply go to the position in the stream and write over it. But this will increase the size of my registry file and limit the amount of clusters one information can use. – BDevGW Nov 08 '19 at 12:00
  • So I have multiple options there again. I can set a small limit to keep everything snall or a large limit which also means a larger registry or I can increase the cluster size but this will increase the size of the database itself extremely. Because of this I choose the first option and just wanted to know if i can insert something to the file without rewriting it and after your first comments I wanted to know what your suggestion is. But thank you really much for your effort to rewrite your whole answer and also providing links. – BDevGW Nov 08 '19 at 12:00
  • @BDevGW: Okay, that means my asumptions were dead wrong. Sorry, but without you being specific there is just no way to help you. This is not something we can talk about in generalized terms. – Christopher Nov 08 '19 at 16:08