1

I was reading the answers to this question and this one. comparing files to C arrays if we want to remove some elements from the middle of a C array we have to shift all other elements so it takes so much time. But if we want to remove the first few items or the last few ones, we can just change the pointers and deallocate the removed elements with takes almost no time, and it's independent of the length of the array. I was wondering why there is no such way for truncating files. I thought maybe there are some meta data about the file in the beginning or end of files that cause the issue but if that is the case I think it must be in either beginning or end of the file so we must be able to remove a few lines from at least one of beginning and end so fast. But it seems like it's not possible. Why is that? What am I missing here?

I need this because I have a 10GB file that I have to remove lines from it's beginning or end one by one until it's empty. I'm on ubuntu 16.04 but I would love to know if there are other solutions in other OS.

Community
  • 1
  • 1
yukashima huksay
  • 5,834
  • 7
  • 45
  • 78
  • What you are suggesting would mean tampering with the FAT (File Allocation Table) or similar structure depending on your file system... This is *possible*, but rather complicated and dangerous... In Windows, there are API functions for this but I'm not sure where the documentation is – Grantly Jan 05 '18 at 12:44
  • @Grantly what about linux? – yukashima huksay Jan 05 '18 at 12:45

1 Answers1

1

This is a very simplistic answer, but files are normally stored on disk by 'pages/blocks' which have a certain amount of bytes. So in theory it would be relatively easy to remove the exact page/block size. Because afaik all blocks are filled completely (except the last one). However, in practice, the chance this is used is very low that exactly the amount of bytes should be removed from the beginning of a file.

For the end of the file it would be easier, but also in this case it does not happen often, and therefore no 'generic' way is implemented for it.

Michel Keijzers
  • 15,025
  • 28
  • 93
  • 119
  • I didn't get why it's not possible for the end of the file. Can you explain it a little bit more? – yukashima huksay Jan 05 '18 at 12:50
  • Aren't there any file-systems that store beginning and end for each block i.e. have variable sized blocks? – yukashima huksay Jan 05 '18 at 12:52
  • It is possible for the end of the file, but probably it is not a 'wanted' future (occurs very less that the end of a file needs to be stripped). – Michel Keijzers Jan 05 '18 at 12:52
  • It would be very strange to me if noone has implemented that, I understand that It's probably not used so often but it has a huge performance gain. So how can I implement it myself? – yukashima huksay Jan 05 '18 at 12:55
  • Mostly, the block do not even have to be consecutive, but how would you remove 200 bytes from the first block that has e.g. 400 bytes? Than you would have to move all bytes in the remainder of the file (blocks), unless there is a file system that can handle a 'fill' rate per block (this would be very efficient probably for storage size). – Michel Keijzers Jan 05 '18 at 12:55
  • You can implement it yourself maybe, but it is a huge amount of work, because it depends on the physical (page) size of the disk, and what about SSD etc. – Michel Keijzers Jan 05 '18 at 12:57
  • oh actually my disk is SSD, doesn't SSD have a totally different logic? would you be so kind to just give me a tiny introduction to it if so? – yukashima huksay Jan 05 '18 at 13:02
  • It might be, I don't know the details of disk I/O systems to be honest. But what if you later use a different way? Than you have to change your program based on a hardware solution. Why don't you change your program so it starts and ends reading from/until a specific position in a file... than you don't need to change the file (if that is possible). – Michel Keijzers Jan 05 '18 at 13:05
  • Is it possible to just bring a fraction of the file into memory? My program should do a task based on every line of a file. it should be interrupted sometimes and then run again to continue from where it left off so I thought of this as a solution. do you have any better Idea that can have better performance? – yukashima huksay Jan 05 '18 at 13:18
  • noting that it will be interrupted/killed every few lines. – yukashima huksay Jan 05 '18 at 13:19
  • Yes, you can read a part of a file, starting from an offset (using seek or fseek if I'm right), and read a number of bytes starting with that offset. Just store those in memory, process it, and after an interruption (or not) read more bytes and process them. This will be irrelevant of files/disks or even streaming data. – Michel Keijzers Jan 05 '18 at 13:41