I am trying to create a large tree of directories using the script below:
for i in {1..5000}
do
mkdir $i
cd $i
done
but the script stops after creating 1038 to 1040 directories. Are there any kind of limitations by linux system?
I am trying to create a large tree of directories using the script below:
for i in {1..5000}
do
mkdir $i
cd $i
done
but the script stops after creating 1038 to 1040 directories. Are there any kind of limitations by linux system?
Linux doesn't really have a limit on the length of a file path. (Most Linux filesystems do limit the length of a file name, typically to something like 256 characters. But there is no limit to the number of components in a path.) So your script probably worked.
However, bash (and other shells) do have problems with long path names. In particular, bugs in the bash interactive shell mean once you are so deep in a directory hierarchy that the length of the current working directory's pathname exceeds some limit, you don't see the correct path information in your terminal session. That can be very confusing; it might lead you to believe that the script failed.
Here's a sample terminal session. I start by changing my prompt to avoid filling the entire screen with the current path:
~/tmp$ export PS1='\W\$ '
tmp$ for i in {1..5000}; do mkdir $i; cd $i; done
10$ # The prompt is incorrect!
10$ basename "$PWD"
5000
10$ cd ~/tmp
tmp$ # count the number of files in the hierarchy
tmp$ find 1 -type v | wc -l
5000
An unrelated problem, also exhibited by bash, is that very deep file hierarchies cause certain actions (such as cd
) to be very slow, presumably because the shell chooses to validate the entire path from the root. In the case of bash, that results in a call to stat
on every component of the path, so the mkdir/cd
loop in your script becomes quadratic in the depth of the file tree. zsh, which otherwise seems to handle the long pathnames correctly, also slows down enormously when moving to a directory whose pathname has a lot of components, although not as much (nor as regularly) as bash
bash —
The bash cd
built-in works but as noted above is excessively slow when there are a lot of components. Bash's built-in pwd
seems to work fine, but the same cannot be said of the substitution logic in $PS1
, which is used to produce the prompt in an interactive shell. Very long prompts also confuse the readline
library, so you'll find that your bash session is essentially unusable if your default $PS1
includes \w
(full path to working directory). As shown in the terminal session above, you can change the prompt to use \W
(basename of working directory), but since the pathname it is computed from was already truncated, it produces a very misleading prompt.
dash —
The dash cd
built-in cannot handle a long pathname. It fails when the pathname would exceed this limit. That causes your script to generate a long stream of error messages, one for each failed cd
, so I'm assuming that you didn't use dash.
Curiously, the dash pwd
built-in works correctly on long pathnames, so if you start dash inside a directory whose pathname is very long, you'll be able to see the correct pathname, even though you won't be able to use cd
to navigate to other folders.
ksh —
The cd
built-in works rapidly, but its argument is limited in length. Also, the pwd
built-in is buggy, truncating the path at some limit. $PWD
is also truncated, and therefore the ksh
prompt will be wrong if you request a prompt containing the current working directory. However, there is a workaround: the pwd
command-line utility included in most Linux distros, typically at /bin/pwd
, does handle long pathnames correctly, so you can use env pwd
to see the correct path:
PS1='$(basename "$(env pwd)") $ '
zsh —
The only issue I found with zsh is the ridiculous amount of time cd
takes when the path has a lot of components. If your default prompt includes the full pathname, your interactive shell won't have much room for anything else, but there don't seem to be other issues. If you use limit the number of pathname components to view, you'll get a usable prompt.
Probably, all of the bugs I just referred to have the same source.
Traditionally, Unix system libraries defined a macro called PATH_MAX
, which could be used to create a buffer large enough to hold any valid path. The assumption was that there was some system limit on path names, and that the limit wasn't too large. Typical values, depending on OS, ranged from 256 to 4096.
The modern Posix standard doesn't make any claims about what the actual maximum path length might be, but it still defines the PATH_MAX
macro:
PATH_MAX
- Maximum number of bytes the implementation will store as a pathname in a user-supplied buffer of unspecified size, including the terminating null character. Minimum number the implementation will accept as the maximum number of bytes in a pathname.
On Linux, PATH_MAX
is 4096. Note that although nothing stops you from redefining PATH_MAX
in your code, it will have no effect because the number was effectively already compiled into the kernel.
In other words, if you are using some system interface which stores a pathname into a buffer, and that interface doesn't have any way to supply the length of the buffer, you can assume that no more than PATH_MAX
bytes will be stored into the buffer. But that doesn't limit the length of a path. It just means that the interface you're using to figure out the path will return a truncated value.
One example of such an interface is getwd
, which stores the full path to the current working directory into a buffer. getwd
doesn't allow you to specify the size of the buffer, so the pathname is truncated at PATH_MAX - 1
, or 4095 bytes on Linux.
That interface was long ago replaced with getcwd
, which takes both a buffer and its length as arguments. But the Posix implementation of getcwd
does not provide any mechanism for figuring out how big the buffer needs to be, so it's pretty common to use the constant PATH_MAX
anyway. Posix does, however, suggest that implementations could allow the caller to provide NULL
instead of a buffer, in which case getcwd
will automatically allocate a buffer which is large enough, fill it in, and return it. (The caller must then later call free()
to return the allocated memory.) And the Linux implementation of getcwd
does precisely that, which allows a program, such as a shell, to get the complete untruncated pathname. Apparently, bash does use that in its built-in pwd
, as well as to create the $PWD
shell variable. But the $PS1
prompt replacer does not; when creating the prompt, the path is truncated to PATH_MAX - 1
characters. And that truncation occurs before the pathname is reduced to its base name for \W
, so if you have a very long path and \W
is in your $PS1
, you'll get a part of a component from the middle of the path.
All of this makes it very confusing to see whether or not your script worked. But on all of the shells I tried except dash, the folders were correctly created (just very slowly with bash and zsh).