2

I am trying to create a large tree of directories using the script below:

for i in {1..5000}
do
mkdir $i
cd $i
done

but the script stops after creating 1038 to 1040 directories. Are there any kind of limitations by linux system?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
PaxPrz
  • 1,778
  • 1
  • 13
  • 29
  • 3
    I'm voting to close this question as off-topic because it belongs in [unix.se], where it seems already to [have an answer](https://unix.stackexchange.com/a/32834/46316), that is there is a 4096 character path length limit, which your script exceeds. – Ken Y-N Jan 20 '20 at 04:42
  • @KenY-N: If the answer you link really says that, it's mistaken. But it's mostly about filename limits, not filepath limits. Linux doesn't really have a filepath limit. – rici Jan 20 '20 at 05:34
  • @pax: what are the actual symptoms you see? Are you certain the script did not run to the end? – rici Jan 20 '20 at 05:39
  • 2
    It's due a Linux limitation in `linux/limits.h` that `#define PATH_MAX 4096` https://stackoverflow.com/a/9449307/3589567 – Alejandro Blasco Jan 20 '20 at 06:21
  • @alejandroBlasco: read the [article](http://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html) linked in that answer. It's not a limitation on path length. – rici Jan 20 '20 at 15:21
  • @rici: yes. if the script had run till the end, it must have had created all the folders. – PaxPrz Jan 20 '20 at 15:27
  • So increasing buffer by changing PATH_MAX value can help me? – PaxPrz Jan 20 '20 at 15:28
  • @Pax: It did create all the folders. I'm asking you why exactly you think it didn't. (Don't take it as an attack. I understand that it is not obvious that the folders were not created. It's useful to be clear about how you collect evidence.) – rici Jan 20 '20 at 15:32
  • For example: did your script produce an error message? If so, you should put that in your question. (The script will produce an error message if you use the `dash` shell, which might be `sh` on your system.) – rici Jan 20 '20 at 15:34
  • @rici: No there weren't any error messages – PaxPrz Jan 20 '20 at 15:34
  • Right. So what makes you think that the script didn't run to the end? – rici Jan 20 '20 at 15:35
  • @rici: cause I cannot list any directories after 1038. – PaxPrz Jan 20 '20 at 15:36
  • 2
    @KenY-N shell scripts are programs too, and I've seen multiple questions about such here on StackOverflow. And now that Windows can run Linux shells it would be useful to know if there's a difference in behavior. – Mark Ransom Jan 20 '20 at 17:33

1 Answers1

5

Linux doesn't really have a limit on the length of a file path. (Most Linux filesystems do limit the length of a file name, typically to something like 256 characters. But there is no limit to the number of components in a path.) So your script probably worked.

However, bash (and other shells) do have problems with long path names. In particular, bugs in the bash interactive shell mean once you are so deep in a directory hierarchy that the length of the current working directory's pathname exceeds some limit, you don't see the correct path information in your terminal session. That can be very confusing; it might lead you to believe that the script failed.

Here's a sample terminal session. I start by changing my prompt to avoid filling the entire screen with the current path:

~/tmp$ export PS1='\W\$ '
tmp$ for i in {1..5000}; do mkdir $i; cd $i; done
10$ # The prompt is incorrect!
10$ basename "$PWD"
5000
10$ cd ~/tmp
tmp$ # count the number of files in the hierarchy
tmp$ find 1 -type v | wc -l
5000

An unrelated problem, also exhibited by bash, is that very deep file hierarchies cause certain actions (such as cd) to be very slow, presumably because the shell chooses to validate the entire path from the root. In the case of bash, that results in a call to stat on every component of the path, so the mkdir/cd loop in your script becomes quadratic in the depth of the file tree. zsh, which otherwise seems to handle the long pathnames correctly, also slows down enormously when moving to a directory whose pathname has a lot of components, although not as much (nor as regularly) as bash

Shells and long pathnames, a brief summary

  • bash — The bash cd built-in works but as noted above is excessively slow when there are a lot of components. Bash's built-in pwd seems to work fine, but the same cannot be said of the substitution logic in $PS1, which is used to produce the prompt in an interactive shell. Very long prompts also confuse the readline library, so you'll find that your bash session is essentially unusable if your default $PS1 includes \w (full path to working directory). As shown in the terminal session above, you can change the prompt to use \W (basename of working directory), but since the pathname it is computed from was already truncated, it produces a very misleading prompt.

  • dash — The dash cd built-in cannot handle a long pathname. It fails when the pathname would exceed this limit. That causes your script to generate a long stream of error messages, one for each failed cd, so I'm assuming that you didn't use dash.

    Curiously, the dash pwd built-in works correctly on long pathnames, so if you start dash inside a directory whose pathname is very long, you'll be able to see the correct pathname, even though you won't be able to use cd to navigate to other folders.

  • ksh — The cd built-in works rapidly, but its argument is limited in length. Also, the pwd built-in is buggy, truncating the path at some limit. $PWD is also truncated, and therefore the ksh prompt will be wrong if you request a prompt containing the current working directory. However, there is a workaround: the pwd command-line utility included in most Linux distros, typically at /bin/pwd, does handle long pathnames correctly, so you can use env pwd to see the correct path:

      PS1='$(basename "$(env pwd)") $ '
    
  • zsh — The only issue I found with zsh is the ridiculous amount of time cd takes when the path has a lot of components. If your default prompt includes the full pathname, your interactive shell won't have much room for anything else, but there don't seem to be other issues. If you use limit the number of pathname components to view, you'll get a usable prompt.

Where do these bugs come from?

Probably, all of the bugs I just referred to have the same source.

Traditionally, Unix system libraries defined a macro called PATH_MAX, which could be used to create a buffer large enough to hold any valid path. The assumption was that there was some system limit on path names, and that the limit wasn't too large. Typical values, depending on OS, ranged from 256 to 4096.

The modern Posix standard doesn't make any claims about what the actual maximum path length might be, but it still defines the PATH_MAX macro:

PATH_MAX

  • Maximum number of bytes the implementation will store as a pathname in a user-supplied buffer of unspecified size, including the terminating null character. Minimum number the implementation will accept as the maximum number of bytes in a pathname.

On Linux, PATH_MAX is 4096. Note that although nothing stops you from redefining PATH_MAX in your code, it will have no effect because the number was effectively already compiled into the kernel.

In other words, if you are using some system interface which stores a pathname into a buffer, and that interface doesn't have any way to supply the length of the buffer, you can assume that no more than PATH_MAX bytes will be stored into the buffer. But that doesn't limit the length of a path. It just means that the interface you're using to figure out the path will return a truncated value.

One example of such an interface is getwd, which stores the full path to the current working directory into a buffer. getwd doesn't allow you to specify the size of the buffer, so the pathname is truncated at PATH_MAX - 1, or 4095 bytes on Linux.

That interface was long ago replaced with getcwd, which takes both a buffer and its length as arguments. But the Posix implementation of getcwd does not provide any mechanism for figuring out how big the buffer needs to be, so it's pretty common to use the constant PATH_MAX anyway. Posix does, however, suggest that implementations could allow the caller to provide NULL instead of a buffer, in which case getcwd will automatically allocate a buffer which is large enough, fill it in, and return it. (The caller must then later call free() to return the allocated memory.) And the Linux implementation of getcwd does precisely that, which allows a program, such as a shell, to get the complete untruncated pathname. Apparently, bash does use that in its built-in pwd, as well as to create the $PWD shell variable. But the $PS1 prompt replacer does not; when creating the prompt, the path is truncated to PATH_MAX - 1 characters. And that truncation occurs before the pathname is reduced to its base name for \W, so if you have a very long path and \W is in your $PS1, you'll get a part of a component from the middle of the path.

All of this makes it very confusing to see whether or not your script worked. But on all of the shells I tried except dash, the folders were correctly created (just very slowly with bash and zsh).

Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341