Portable way to get file size (in bytes) in the shell

Question

On Linux, I use stat --format="%s" FILE, but the Solaris machine I have access to doesn't have the stat command. What should I use then?

I'm writing Bash scripts and can't really install any new software on the system.

I've considered already using:

perl -e '@x=stat(shift);print $x[7]' FILE

or even:

ls -nl FILE | awk '{print $5}'

But neither of these looks sensible - running Perl just to get file size? Or running two programs to do the same?

well a bash script *is* software, and if you can put that on the system, you can install software. — just somebody, Nov 29 '09 at 11:58
Technically - true. I meant that I don't have root privileges, and can't install new packages. Sure installing in home dir is possible. But not really when I have to make the script that is portable, and installation on "X" machines, new additional packages becomes tricky. — , Nov 29 '09 at 12:11

score 241 · Accepted Answer · edited Jul 04 '17 at 08:41

241

wc -c < filename (short for word count, -c prints the byte count) is a portable, POSIX solution. Only the output format might not be uniform across platforms as some spaces may be prepended (which is the case for Solaris).

Do not omit the input redirection. When the file is passed as an argument, the file name is printed after the byte count.

I was worried it wouldn't work for binary files, but it works OK on both Linux and Solaris. You can try it with wc -c < /usr/bin/wc. Moreover, POSIX utilities are guaranteed to handle binary files, unless specified otherwise explicitly.

edited Jul 04 '17 at 08:41

Palec

12,743
8
69
138

answered Nov 29 '09 at 13:45

Carl Smotricz

66,391
18
125
167

71

Or just `wc -c < file` if you don't want the filename appearing. – caf Nov 29 '09 at 23:06
39

If I'm not mistaken, though, `wc` in a pipeline must `read()` the entire stream to count the bytes. The `ls`/`awk` solutions (and similar) use a system call to get the size, which *should* be linear time (versus O(size)) – jmtd May 07 '11 at 16:40
1

I recall `wc` being very slow the last time I did that on a full hard disk. It was slow enough that I could re-write the script before the first one finished, came here to remember how I did it lol. – Camilo Martin Jul 27 '12 at 13:43
7

I wouldn't use `wc -c`; it looks much neater but `ls` + `awk` is better for speed/resource use. Also, I just wanted to point out that you actually need to post-process the results of `wc` as well because on some systems it will have whitespace before the result, which you may need to strip before you can do comparisons. – Haravikk Jul 28 '13 at 10:21
1

What would be then the best option to print the result in human friendly format? e.g. MB, KB – Rdpi Jan 11 '15 at 04:39
4

`wc -c` is great, but it will not work if you don't have read access to the file. – Silas Jan 13 '16 at 16:47
4

The `stat` and `ls` utilities just execut the `lstat` syscall and get the file length without reading the file. Thus, they do not need the read permission and their performance does not depend on the file's length. `wc` actually opens the file and usually reads it, making it perform much worse on large files. But [GNU coreutils wc](http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/wc.c) optimizes when only byte count of a regular file is wanted: it uses `fstat` and `lseek` syscalls to get the count. See the comment with `(dd ibs=99k skip=1 count=0; ./wc -c) < /etc/group` in its source. – Palec Jul 04 '17 at 10:40
Would it be efficient to use `wc -c < FILE` for very large files such as 100GB ? @Carl Smotricz – alper Nov 23 '18 at 05:54
1

@alper: I haven't tested, but I suspect that redirecting a large file like this is terribly slow. My answer was about measuring file size portably, not efficiently. For a quick size based on the directory data, you'd probably be better off looking at some of the other answers here. – Carl Smotricz Jan 23 '19 at 14:58

score 48 · Answer 2 · edited Jan 20 '22 at 21:52

48

I ended up writing my own program (really small) to display just the size. More information is in bfsize - print file size in bytes (and just that).

The two cleanest ways in my opinion with common Linux tools are:

stat -c %s /usr/bin/stat

50000


wc -c < /usr/bin/wc

36912

But I just don't want to be typing parameters or pipe the output just to get a file size, so I'm using my own bfsize.

edited Jan 20 '22 at 21:52

Peter Mortensen

30,738
21
105
131

answered Mar 10 '11 at 10:59

fwhacking

489
4
2

2

First line of problem description states that stat is not an option, and the wc -c is the top answer for over a year now, so I'm not sure what is the point of this answer. – Mar 11 '11 at 15:09
28

The point is in people like me who find this SO question in Google and `stat` _is_ an option for them. – yo' Nov 22 '12 at 21:05
4

I'm working on an embedded system where `wc -c` takes 4090 msec on a 10 MB file vs "0" msec for `stat -c %s`, so I agree it's helpful to have alternative solutions even when they don't answer the exact question posed. – Robert Calhoun Mar 09 '13 at 01:37
4

"stat -c" is not portable / does not accept the same arguments on MacOS as it does on Linux. "wc -c" will be very slow for large files. – Orwellophile Mar 20 '13 at 11:58
1

`stat` gives the size of locked file, when `wc` does not - Cygwin under Windows on c:\pagefile.sys. – pbies May 31 '14 at 21:38
3

stat is not portable either. `stat -c %s /usr/bin/stat` `stat: illegal option -- c` `usage: stat [-FlLnqrsx] [-f format] [-t timefmt] [file ...]` – May 26 '15 at 14:48
i did say that. try my answer, based on **ls** should be quite portable http://stackoverflow.com/a/15522969/912236 – Orwellophile Jun 01 '15 at 16:48
I always wondered why the `stat` CLI utility was never included in POSIX. – Ciro Santilli OurBigBook.com Oct 16 '18 at 07:18

score 39 · Answer 3 · edited Jan 26 '22 at 18:41

39

Even though du usually prints disk usage and not actual data size, the GNU Core Utilities du can print a file's "apparent size" in bytes:

du -b FILE

But it won't work under BSD, Solaris, macOS, etc.

edited Jan 26 '22 at 18:41

Peter Mortensen

30,738
21
105
131

answered Oct 28 '11 at 08:47

fwhacking

980
8
6

6

On MacOS X, `brew install coreutils` and `gdu -b` will achieve the same effect – Jose Alban Apr 19 '16 at 09:13
3

I prefer this method because `wc` needs to read the whole file befor giving a result, `du` is immediate. – CousinCocaine Jan 02 '17 at 11:36
4

POSIX mentions `du -b` in a completely different context in [`du` rationale](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/du.html#tag_20_36_18). – Palec Jul 04 '17 at 11:39
This uses just the `lstat` call, so its performance does not depend on file size. Shorter than `stat -c '%s'`, but less intuitive and works differently for folders (prints size of each file inside). – Palec Jul 04 '17 at 12:00
1

[FreeBSD `du`](https://www.freebsd.org/cgi/man.cgi?query=du&sektion=1) can get close using `du -A -B1`, but it still prints the result in multiples of 1024B blocks. Did not manage to get it to print bytes count. Even setting `BLOCKSIZE=1` in the environemnt does not help, because 512B block are used then. – Palec Jul 04 '17 at 12:31
Where *does* it work? Only on Linux? – Peter Mortensen Jan 26 '22 at 18:41

score 13 · Answer 4 · edited Jan 20 '22 at 21:33

13

Finally I decided to use ls, and Bash array expansion:

TEMP=( $( ls -ln FILE ) )
SIZE=${TEMP[4]}

It's not really nice, but at least it does only one fork+execve, and it doesn't rely on a secondary programming language (Perl, Ruby, Python, or whatever).

edited Jan 20 '22 at 21:33

Peter Mortensen

30,738
21
105
131

answered Nov 29 '09 at 19:12

Just an aside - the 'l' in '-ln' is not required; '-n' is exactly the same as '-ln' – barryred May 14 '13 at 13:07
1

No, it's not. Just compare outputs. – May 14 '13 at 16:33
1

One would guess the portable `ls -ln FILE | { read _ _ _ _ size _ && echo "$size"; }` needs not fork for the second step of the pipeline, as it uses just built-ins, but Bash 4.2.37 on Linux forks twice (still only one `execve`, though). – Palec Jul 04 '17 at 13:07
`read _ _ _ _ size _ <<<"$(exec ls -ln /usr/bin/wc)" && echo "$size"` works with single fork and single exec, but it uses a temporary file for the here-string. It can be made portable by replacing the here-string with POSX-compliant [here-document](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04). BTW note the `exec` in the subshell. Without that, Bash performs one fork for the subshell and another one for the command running inside. This is the case in the code you provide in this answer. too. – Palec Jul 04 '17 at 13:29
Forks should not be a problem; most people do not write an `exec` in a subshell that contains only one command. A temporary file is worse, but still, come on, this is just shell. Trying to limit the number of forks so strictly is definitely **premature optimization**, i.e. the root of all evil. Portability, readability and short length should beat small performance gains like this one. And if you need to optimize a working shell script, you should probably rewrite it (or at least its critical parts) to C. – Palec Jul 04 '17 at 13:40
1

The `-l` is superfluous in presence of `-n`. Quoting [POSIX `ls` manpage](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html): *`-n`: Turn on the `-l` (ell) option, but when writing the file's owner or group, write the file's numeric UID or GID rather than the user or group name, respectively. Disable the `-C`, `-m`, and `-x` options.* – Palec Jul 04 '17 at 15:03

score 10 · Answer 5 · edited Jan 26 '22 at 21:23

10

BSD systems have stat with different options from the GNU Core Utilities one, but with similar capabilities.

stat -f %z <file name>

This works on macOS (tested on 10.12), FreeBSD, NetBSD and OpenBSD.

edited Jan 26 '22 at 21:23

Peter Mortensen

30,738
21
105
131

answered Feb 02 '17 at 06:14

user7504315

101
1
2

Solaris does not have `stat` utility at all, though. – Palec Jul 04 '17 at 15:42
Busybox doesn't support that structure: stat: unrecognized option: % BusyBox v1.32.1 () multi-call binary. – Jason Martin Feb 19 '21 at 21:33

score 9 · Answer 6 · edited Jan 26 '22 at 18:47

When processing ls -n output, as an alternative to ill-portable shell arrays, you can use the positional arguments, which form the only array and are the only local variables in the standard shell. Wrap the overwrite of positional arguments in a function to preserve the original arguments to your script or function.

getsize() { set -- $(ls -dn "$1") && echo $5; }
getsize FILE

This splits the output of ln -dn according to current IFS environment variable settings, assigns it to positional arguments and echoes the fifth one. The -d ensures directories are handled properly and the -n assures that user and group names do not need to be resolved, unlike with -l. Also, user and group names containing white space could theoretically break the expected line structure; they are usually disallowed, but this possibility still makes the programmer stop and think.

score 8 · Answer 7 · edited Jan 20 '22 at 21:49

8

Cross-platform fastest solution (it only uses a single fork() for ls, doesn't attempt to count actual characters, doesn't spawn unneeded awk, perl, etc.).

It was tested on Mac OS X and Linux. It may require minor modification for Solaris:

__ln=( $( ls -Lon "$1" ) )
__size=${__ln[3]}
echo "Size is: $__size bytes"

If required, simplify ls arguments, and adjust the offset in ${__ln[3]}.

Note: It will follow symbolic links.

edited Jan 20 '22 at 21:49

Peter Mortensen

30,738
21
105
131

answered Mar 20 '13 at 12:02

Orwellophile

13,235
3
69
45

1

Or put it in a shell script: ls -Lon "$1" | awk '{ print $4 }' – Luciano Apr 21 '16 at 13:10
1

@Luciano I think you have totally missed the point of **not forking** and doing a task in **bash** rather than using bash to string a lot of unix commands together in an inefficient fashion. – Orwellophile Jun 08 '16 at 09:05

score 5 · Answer 8 · edited Jan 20 '22 at 21:38

5

If you use find from GNU fileutils:

size=$( find . -maxdepth 1 -type f -name filename -printf '%s' )

Unfortunately, other implementations of find usually don't support -maxdepth, nor -printf. This is the case for e.g. Solaris and macOS find.

edited Jan 20 '22 at 21:38

Peter Mortensen

30,738
21
105
131

answered Nov 29 '09 at 13:44

Dennis Williamson

346,391
90
374
439

FYI maxdepth is not needed. It could be rewritten as `size=$(test -f filename && find filename -printf '%s')`. – Palec Feb 26 '14 at 00:49
@Palec: The `-maxdepth` is intended to prevent `find` from being recursive (since the `stat` which the OP needs to replace is not). Your `find` command is missing a `-name` and the `test` command isn't necessary. – Dennis Williamson Feb 26 '14 at 01:39
@DennisWilliamson `find` searches its parameters recursively for files matching given criteria. If the parameters are not directories, the recursion is… quite simple. Therefore I first test that `filename` is really an existing ordinary file, and then I print its size using `find` that has nowhere to recurse. – Palec Feb 26 '14 at 04:38
1

`find . -maxdepth 1 -type f -name filename -printf '%s'` works only if the file is in the current directory, and it may still examine each file in the directory, which might be slow. Better use (even shorter!) `find filename -maxdepth 1 -type f -printf '%s'`. – Palec Jul 04 '17 at 11:13

score 3 · Answer 9 · edited Jan 20 '22 at 21:51

You can use the find command to get some set of files (here temporary files are extracted). Then you can use the du command to get the file size of each file in a human-readable form using the -h switch.

find $HOME -type f -name "*~" -exec du -h {} \;

Output:

4.0K    /home/turing/Desktop/JavaExmp/TwoButtons.java~
4.0K    /home/turing/Desktop/JavaExmp/MyDrawPanel.java~
4.0K    /home/turing/Desktop/JavaExmp/Instream.java~
4.0K    /home/turing/Desktop/JavaExmp/RandomDemo.java~
4.0K    /home/turing/Desktop/JavaExmp/Buff.java~
4.0K    /home/turing/Desktop/JavaExmp/SimpleGui2.java~

score 2 · Answer 10 · edited Jan 26 '22 at 18:51

You first Perl example doesn't look unreasonable to me.

It's for reasons like this that I migrated from writing shell scripts (in Bash, sh, etc.) to writing all but the most trivial scripts in Perl. I found that I was having to launch Perl for particular requirements, and as I did that more and more, I realised that writing the scripts in Perl was probably a more powerful (in terms of the language and the wide array of libraries available via CPAN) and more efficient way to achieve what I wanted.

Note that other shell-scripting languages (e.g., Python and Ruby) will no doubt have similar facilities, and you may want to evaluate these for your purposes. I only discuss Perl since that's the language I use and am familiar with.

Well, I do a lot of Perl writing myself, but sometimes the tool is chosen for me, not by me :) — , Nov 29 '09 at 12:08

score 1 · Answer 11 · edited Jan 26 '22 at 19:03

I don't know how portable GNU Gawk's filefuncs extension is. The basic syntax is

time gawk -e '@load "filefuncs"; BEGIN {
     fnL[1] = ARGV[ARGC-1];
     fts(fnL, FTS_PHYSICAL, arr); print "";

     for (fn0 in arr) {
         print arr[fn0]["path"] \
           " :: "arr[fn0]["stat"]["size"]; };

     print ""; }' genieMV_204583_1.mp4


genieMV_204583_1.mp4 :: 259105690
real    0m0.013s


ls -Aln genieMV_204583_1.mp4

----------  1 501  20  259105690 Jan 25 09:31
            genieMV_204583_1.mp4

That syntax allows checking multiple files at once. For a single file, it's

time gawk -e '@load "filefuncs"; BEGIN {
      stat(ARGV[ARGC-1], arr);
      printf("\n%s :: %s\n", arr["name"], \
           arr["size"]); }' genieMV_204583_1.mp4

   genieMV_204583_1.mp4 :: 259105690
   real    0m0.013s

It is hardly any incremental savings. But admittedly it is slightly slower than stat straight up:

time stat -f '%z' genieMV_204583_1.mp4

259105690
real    0m0.006s (BSD-stat)


time gstat -c '%s' genieMV_204583_1.mp4

259105690
real    0m0.009s (GNU-stat)

And finally, a terse method of reading every single byte into an AWK array. This method works for binary files (front or back makes no diff):

time mawk2 'BEGIN { RS = FS = "^$";
     FILENAME = ARGV[ARGC-1]; getline;
     print "\n" FILENAME " :: "length"\n"; }' genieMV_204583_1.mp4

genieMV_204583_1.mp4 :: 259105690
real    0m0.270s


time mawk2 'BEGIN { RS = FS = "^$";
   } END { print "\n" FILENAME " :: " \
     length "\n"; }'  genieMV_204583_1.mp4

genieMV_204583_1.mp4 :: 259105690
real    0m0.269

But that's not the fastest way because you're storing it all in RAM. The normal AWK paradigm operates upon lines. The issue is that for binary files like MP4 files, if they don't end exactly on \n, the summing of length + NR method would overcount by one. The code below is a form of catch-all by explicitly using the last 1-or-2-byte as the line-splitter RS.

I found that it's much faster with the 2-byte method for binaries, and the 1-byte method it's a typical text file that ends with newlines. With binaries, 1-byte one may end up row-splitting far too often and slowing it down.

But we're close to nitpicking here, since all it took mawk2 to read in every single byte of that 1.83 GB .txt file was 0.95 seconds, so unless you're processing massive volumes, it's negligible.

Nonetheless, stat is still by far the fastest, as mentioned by others, since it's an OS filesystem call.

time mawk2 'BEGIN { FS = "^$";
    FILENAME = ARGV[ARGC-1];
    cmd = "tail -c 2 \""FILENAME"\"";
    cmd | getline XRS;
    close(cmd);

    RS = ( length(XRS) == 1 ) ? ORS : XRS ;

} { bytes += length } END {

    print FILENAME " :: "  bytes + NR * length(RS) }' genieMV_204583_1.mp4

        genieMV_204583_1.mp4 :: 259105690
        real    0m0.092s

        m23lyricsRTM_dict_15.txt :: 1961512986
        real    0m0.950s


ls -AlnFT "${m3t}" genieMV_204583_1.mp4

-rw-r--r--  1 501  20  1961512986 Mar 12 07:24:11 2021 m23lyricsRTM_dict_15.txt

-r--r--r--@ 1 501  20   259105690 Jan 25 09:31:43 2021 genieMV_204583_1.mp4

(The file permissions for MP4 was updated because the AWK method required it.)

score 1 · Answer 12 · edited Jan 26 '22 at 19:09

1

I'd use ls for a better speed instead of wc which will read all the stream in a pipeline:

ls -l <filename> | cut -d ' ' -f5

This is in plain bytes
Use the flag --b M or --b G for the output in megabytes or gigabytes (per saying: not portable by @Andrew Henle on the comments).

BTW, if you're planning to go for: du cut

du -b <filename> | cut -f -1

use -h for a better human reading

Or, by du awk

du -h <filename> | awk '{print $1}'

Or stat:

stat <filename> | grep Size: | awk '{print $2}'

edited Jan 26 '22 at 19:09

Peter Mortensen

30,738
21
105
131

answered Nov 18 '21 at 21:30

PYK

3,674
29
17

*Use the flag `--b M` or `--b G` for the output in Megabytes or Gigabytes* Note, though, that neither of those are portable. https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/ls.html – Andrew Henle Nov 18 '21 at 22:05
for macOS use ```ls -l | cut -d " " -f8 ``` – Sanaf Aug 28 '22 at 11:16

score -3 · Answer 13 · edited Jan 20 '22 at 21:31

-3

If you have Perl on your Solaris, then use it. Otherwise, ls with AWK is your next best bet, since you don't have stat or your find is not GNU find.

edited Jan 20 '22 at 21:31

Peter Mortensen

30,738
21
105
131

answered Nov 29 '09 at 14:02

ghostdog74

327,991
56
259
343

score -3 · Answer 14 · edited Jan 20 '22 at 21:42

-3

There is a trick in Solaris I have used. If you ask for the size of more than one file, it returns just the total size with no names - so include an empty file like /dev/null as the second file:

For example,

command fileyouwant /dev/null

I can't remember which size command this works for - ls, wc, etc. - unfortunately I don't have a Solaris box to test it.

edited Jan 20 '22 at 21:42

Peter Mortensen

30,738
21
105
131

answered Nov 29 '09 at 19:39

Martin Beckett

94,801
28
188
263

score -4 · Answer 15 · edited Jan 20 '22 at 21:28

-4

On Linux you can use du -h $FILE. That may work on Solaris too.

edited Jan 20 '22 at 21:28

Peter Mortensen

30,738
21
105
131

answered Nov 29 '09 at 12:13

knittl

246,190
53
318
364

1

Actually, units could be converted, but this shows disk usage instead of file data size ("apparent size"). – Palec Jul 04 '17 at 11:55

score -7 · Answer 16 · edited Jan 20 '22 at 21:31

-7

Try du -ks | awk '{print $1*1024}'. That might just work.

edited Jan 20 '22 at 21:31

Peter Mortensen

30,738
21
105
131

answered Nov 29 '09 at 14:07

Aditya

117
5

1

This shows disk usage instead of file data size ("apparent size"). – Palec Jul 04 '17 at 11:57

Portable way to get file size (in bytes) in the shell

16 Answers16

Linked