19

I have an embedded Linux system, that stores data in a very large file, appending new data to the end. As the file size grows near filling available storage space, I need to remove oldest data.

Problem is, I can't really accept the disruption it would take to move the massive bulk of data "up" the file, like normal - lock the file for an extended period of time just to rewrite it (plus this being a flash medium, it would cause unnecessary wear to the flash).

Probably the easiest way would be to split the file into multiple smaller ones, but this has several downsides related to how the data is handled and processed - all the 'client end' software expects single file. OTOH it can handle 'corruption' of having the first record cut in half, so the file doesn't need to be trimmed at record offsets, just 'somewhere up there', e.g. first few iNodes freed. Oldest data is obsolete anyway so even more severe corruption of the beginning of the file is completely acceptable, as long as the 'tail' remains clean, and liberties can be taken with how much exactly is removed - 'roughly several first megabytes' is okay, no need for 'first 4096KB exactly' precision.

Is there some method, API, trick, hack to truncate beginning of file like that?

SF.
  • 13,549
  • 14
  • 71
  • 107
  • Related: [Truncating the first 100MB of a file in linux](http://stackoverflow.com/questions/18072180/truncating-the-first-100mb-of-a-file-in-linux) – fedorqui Jun 16 '14 at 09:03
  • @fedorqui: "There is no **(portable, or filesystem neutral)** way to remove bytes from the start (or in the middle) of a file". I'm asking for the non-portable and non-fs-neutral ones. – SF. Jun 16 '14 at 09:09
  • 1
    @SF. despite the difference, you might find the answers useful. Particularly Joni's which links to a page that describes `EXT4_IOC_TRUNCATE_BLOCK_RANGE` which may be of use to you. – eerorika Jun 16 '14 at 09:11
  • If you assume a fixed file size, you could try a file based ring buffer implementation ([i.e.](http://stackoverflow.com/questions/7195384/how-to-implement-a-circular-buffer-using-a-file)) – Nils_M Jun 16 '14 at 09:17
  • @user2079303: Yes, that looks quite promising. – SF. Jun 16 '14 at 09:19
  • 1. use logrotate (multiple files); 2. use a ring buffer (plus pointer); 3. use sparse files, but this is tricky, not portable, and what will you do when your file becomes, say, 1TB hole followed by 10MB data? How far do you want the file to grow? – Dima Tisnek Jun 23 '14 at 08:55
  • 1
    @qarma: Actually, that sparse files trick sounds rather good. At some 3MB/month maximum data growth, it would take ages for the file to grow to unreasonable dimensions. – SF. Jun 23 '14 at 10:27

3 Answers3

10

You can achieve the goal with Linux kernel v3.15 above for ext4/xfs file system.

int ret = fallocate(fd, FALLOC_FL_COLLAPSE_RANGE, 0, 4096);

See here Truncating the first 100MB of a file in linux

Community
  • 1
  • 1
Sunding Wei
  • 1,803
  • 17
  • 12
6

The easiest solution for your old applications would be a FUSE filesystem which gives them access to the underlying file, but with the offset cyclically shifted. This would allow you to implement a ringbuffer at the physical level. The FUSE layer would be fairly trivial as it only needs to adjust all filepositions by a constant, modulo filesize.

MSalters
  • 173,980
  • 10
  • 155
  • 350
0

What about setting up a separate process that renames the output file when it reaches a predefined size (for instance by adding the linux time at the end of the file name).

This would allow you to keep the old data and the main process will recreate the output file the next time it writes to it.

Another cron job may remove the old file every now and then.

Paolo Brandoli
  • 4,681
  • 26
  • 38
  • This is the 'multiple files' approach. The problem is when the file occupies like, 90% of available storage, I'd want it scaled down to 80% available storage, keeping the 'middle' data in, instead of just starting from scratch. – SF. Jun 16 '14 at 09:11
  • Then, if you also provide a service that parse that file, you could use it as a cyclic buffer: when it reaches a predefined size you start writing it from the beginning. You could use a synchronizing byte (e.g. FF) and the timestamp immediately after the sync to understand where the data starts – Paolo Brandoli Jun 16 '14 at 09:15