Truncating the first 100MB of a file in linux

Question

I am referring to How can you concatenate two huge files with very little spare disk space?

I'm in the midst of implementing the following:

Allocate a sparse file of the combined size.
Copy 100Mb from the end of the second file to the end of the new file.
Truncate 100Mb of the end of the second file
Loop 2&3 till you finish the second file (With 2. modified to the correct place in the destination file).
Do 2&3&4 but with the first file.

I would like to know if is there anyone there who are able to "truncate" a given file in linux? The truncation is by file size, for example if the file is 10GB, I would like to truncate the first 100MB of the file and leave the file with remaining 9.9GB. Anyone could help in this?

Thanks

Did you google for `Linux file truncate`? It would give you good answers! — Basile Starynkevitch, Aug 06 '13 at 05:22
possible duplicate of [Truncate file at front](http://stackoverflow.com/questions/706167/truncate-file-at-front) — Ciro Santilli OurBigBook.com, Oct 16 '14 at 09:22
[How do I remove the first 300 million lines from a 700 GB txt file on a system with 1 TB max disk space?](https://unix.stackexchange.com/q/610494) on unix.SE points out that you can `dd` in place (conv=notrunc) to copy the data earlier in the file before truncating, getting the job done with no extra disk space needed. But that's horrible as part of a repeated process to shift data from the start of one file into the end of another. — Peter Cordes, Sep 22 '20 at 02:19

score 33 · Answer 1 · edited Sep 22 '20 at 20:08

33

Answer, now this is reality with Linux kernel v3.15 (ext4/xfs)

Read here http://man7.org/linux/man-pages/man2/fallocate.2.html

Testing code

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdlib.h>
#include <fcntl.h>

#ifndef FALLOC_FL_COLLAPSE_RANGE
#define FALLOC_FL_COLLAPSE_RANGE        0x08
#endif

int main(int argc, const char * argv[])
{
    int ret;
    char * page = malloc(4096);
    int fd = open("test.txt", O_CREAT | O_TRUNC | O_RDWR, 0644);

    if (fd == -1) {
        free(page);
        return (-1);
    }

    // Page A
    printf("Write page A\n");
    memset(page, 'A', 4096);
    write(fd, page, 4096);

    // Page B
    printf("Write page B\n");
    memset(page, 'B', 4096);
    write(fd, page, 4096);

    // Remove page A
    ret = fallocate(fd, FALLOC_FL_COLLAPSE_RANGE, 0, 4096);
    printf("Page A should be removed, ret = %d\n", ret);

    close(fd);
    free(page);

    return (0);
}

edited Sep 22 '20 at 20:08

helvete

2,455
13
33
37

answered Aug 01 '14 at 10:43

Sunding Wei

1,803
17
12

`GNU_SOURCE_` needs to be defined before the inclusion of `fcntl.h` – at least on Ubuntu 16.04. Only then `fallocate` and `FALLOC_FL_COLLAPSE_RANGE` are available as gnu-specific (experimental) features. – Hermann Apr 17 '18 at 17:22
great! but how do i use from bash ? suppose I don't want to compile c code – Pavel Niedoba May 03 '18 at 12:21
3

In shell you can use `man 1 fallocate`. Like this: `fallocate -c -o offset -l length filename`. You need `apt install util-linux` (as of Ubuntu 18.04). – gluk47 Oct 20 '18 at 19:09

score 31 · Answer 2 · edited May 29 '21 at 06:08

31

Chopping off the beginning of a file is not possible with most file systems and there's no general API to do it; for example the truncate function only modifies the ending of a file.

You may be able to do it with some file systems though. For example the ext4 file system recently got an ioctl that you may find useful: http://lwn.net/Articles/556136/

Update: About a year after this answer was written, support for removing blocks from beginning and middle of files on ext4 and xfs file systems was added to the fallocate function, by way of the FALLOC_FL_COLLAPSE_RANGE mode. It's more convenient than using the low level iotcl's yourself.

There's also a command line utility with the same name as the C function. Assuming your file is on a supported file system, this will delete the first 100MB:

fallocate -c -o 0 -l 100M yourfile

delete the first 1GB:

fallocate -c -o 0 -l 1G yourfile

edited May 29 '21 at 06:08

jshepherd

900
1
9
22

answered Aug 06 '13 at 05:51

Joni

108,737
14
143
193

The OP mentions truncating *at the end of the file* in the question body – Basile Starynkevitch Aug 06 '13 at 05:56
1

Yes, and also the beginning. – Joni Aug 06 '13 at 06:04
though there's no clear solution, what is in my mind now is just to make use of `truncate` command to manually truncate the file's tail by getting the size of the file subtract by 100MB. Thanks for the suggestion though... – CheeHow Aug 06 '13 at 06:26
Of note, ext3 doesn't support it and errors with `fallocate: fallocate failed: Operation not supported` – Artem Russakovskii Mar 19 '20 at 10:05
Unfortunately, I get an error `fallocate: fallocate failed: Invalid argument` in bash – Michael Altfield Jan 15 '23 at 23:26

Basile Starynkevitch · Answer 3 · 2013-08-06T05:57:43.543

5

Please read a good Linux programming book, e.g. Advanced Linux Programming.

You need to use Linux kernel syscalls, see syscalls(2)

In particular truncate(2) (both for truncation, and for extending a sparse file on file systems supporting it), and stat(2) to notably get the file size.

There is no (portable, or filesystem neutral) way to remove bytes from the start (or in the middle) of a file, you can truncate a file only at its end.

edited Aug 06 '13 at 05:57

answered Aug 06 '13 at 05:20

Basile Starynkevitch

223,805
18
296
547

1

yes, that is exactly what my problem is. Anyway, as far as I know, truncate in linux only truncate to a fixed file size. for example if you want your file size to be 4KB, you simply do `truncate -s 4k filename.txt`. What I want is to have my file reduce its either head or tail by 100MB. Is that achievable? – CheeHow Aug 06 '13 at 05:35

score 5 · Answer 4 · answered Apr 26 '16 at 15:26

5

If you can work with ASCII lines and not bytes, then removing the first n lines of a file is easy. For example to remove the first 100 lines:

sed -i 1,100d /path/to/file

answered Apr 26 '16 at 15:26

lyderic

382
3
7

3

lines is different from size. – user2284570 Sep 07 '16 at 22:37

score 2 · Answer 5 · edited May 09 '20 at 01:12

I found I had to use a combination of fallocate and sed before the file would shrink in size, so I had a 43MB file and I want to get it down to around 5MB

fallocate -p -o 0 -l 38m fallocate.log

I noticed this filled the first line with a bunch of "nonsense" characters but my file was still 43MB in size

I then used sed to delete the first line

sed -i 1d fallocate.log

and the file size is now 4.2MB in size.

score 2 · Answer 6 · answered Sep 22 '20 at 02:47

Related: How do I remove the first 300 million lines from a 700 GB txt file on a system with 1 TB max disk space? on unix.SE points out that you can dd in place (conv=notrunc) to copy the data earlier in the file before truncating, getting the job done with no extra disk space needed.

That's horrible as part of a repeated process to shift data from the start of one file into the end of another. But worth mentioning for other use-cases where the purpose of truncating the front is to actually bring a specific point in the file to the front, not just to free disk space.

I would like to truncate the first 100MB of the file and leave the file with remaining 9.9GB

That's the opposite of what the list of steps says to do, from the answer on How can you concatenate two huge files with very little spare disk space? which you say you're following. @Douglas Leeder suggested copying into the middle of a sparse file so you only need to truncate at the end, which is easy and portable with a POSIX ftruncate(2) system call on the open fd you're using to read that file.

But if you want to avoid copying the first file, and just append the 2nd file to the end of the first, yes you do need to free data at the start of the 2nd file after you've read it. But note that you don't need to fully truncate it. You just need to free that space, e.g. by making the existing file sparse replacing that allocated space with a "hole".

The Linux-specific system call fallocate(2) can do that with FALLOC_FL_PUNCH_HOLE on FSes including XFS (since Linux 2.6.38), ext4 (since 3.0), BTRFS (since 3.7).

So it's available earlier than FALLOC_FL_COLLAPSE_RANGE (Linux 3.15) which shortens the file instead of leaving a hole. Linux 3.15 is pretty old by now so hopefully that's irrelevant.

Punching holes in data after you read it (and get it safely written to the other file) is perhaps simpler than shifting data within the file, in terms of being sure of the semantics for file position of a file descriptor you're reading from, if it's open while you use FALLOC_FL_COLLAPSE_RANGE.

The fallocate(1) command-line tool is built around that system call, allowing you do to either of those things on systems that support them.

Willem van Ketwich · Answer 7 · 2016-11-30T22:44:54.010

This is a pretty old question by now, but here is my take on it. Excluding the requirement for it to be done with limited space available, I would use something similar to the following to truncate the first 100mb of a file:

$ tail --bytes=$(expr $(wc -c < logfile.log) - 104857600) logfile.log > logfile.log.tmp
$ mv logfile.log.tmp logfile.log

Explanation:

This outputs the last nn bytes of the file (tail --bytes).
The number of bytes in the file to output is calculated as the size of the file (wc -c < logfile.log) minus 100Mb (expr $( ... ) - 104857600). This would leave us with 100Mb less than the size of the file to take the tail of (eg. 9.9Gb)
This is then output to a temp file and then moved back to the original file name to leave the truncated file.

score 1 · Answer 8 · edited Nov 19 '19 at 06:21

1

Remove all but the last 10,000 lines from a file.

sed -i 1,$( ( $(wc -l < path/to/file) -10000 ) )d path/to/file

edited Nov 19 '19 at 06:21

Community

1
1

answered Mar 28 '17 at 18:04

William Yates

31
1

2

the question was based on file size, not quantity of lines – Paul Sturm Mar 28 '17 at 18:44
Tried this expression, sorry didn't work for me! sed -i 1,$( ( $(wc -l < app.log) -10000 ) )d app.log – java_enthu Jun 10 '20 at 18:30
This works for me: sed -i 1,$(($(wc -l < app.log)-1000))d app.log. Thanks. – RayCh Apr 20 '21 at 08:34

Its not blank · Answer 9 · 2021-10-18T08:47:32.020

1

Option 1 -- cut -b SIZE_TO_TRUNCATE_KB- <file_name>

Option 2 -- echo "$(tail -<NO_OF_LINES> <file_name>)" > <file_name>

edited Oct 18 '21 at 08:47

answered Jun 27 '21 at 11:00

Its not blank

3,055
22
37

Truncating the first 100MB of a file in linux

9 Answers9

Linked