-2

Given a path to some directory (i.e. path/to/directory), I want to obtain the output in the following format:

100|path/to/directory/a/file1.txt
200|path/to/directory/b/file2.txt
321|path/to/directory/c/d/file3.txt
...

Here 100, 200, 321 etc. are the sizes (in bytes) of the corresponding files (the symbol | is used as a separator.)

I tried to use the following command:

find path/to/directory -maxdepth 3 -type f -printf "%s|%p\n" > path/to/output.txt

But then I had a lot of error messages of the following form:

find: `path/to/directory/my directory 1': No such file or directory

I noticed that this happened when the name of some directory contained at least one space. So my question is: what command can I use so that it will produce the desired output, no matter what chars occur in file names and directory names?

The question is not a duplicate of this question because there is an additional issue of problematic names. This answer mentions the issue of subdirectory or file names containing white space, then says that it can be solved by -print0 | xargs -0, then says that it's easier to avoid using the xargs construction altogether. I have found one relevant answer here, but was unable to modify it so that it would fit my needs.

EDIT

I have found the issue that causes the problem. The thing is, I run these commands from Cygwin or MSYS2, which leads to this problem because the long filenames contain characters encoded by more than one byte. The problem is also described here and here. Therefore, Cygwin and MSYS2 are unable to deal with some files whose paths do not exceed the maximum length (255 characters) allowed by the OS (Windows). For example, I have created a file whose full path (in Windows) is the following:

E:/test1/αβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγ.txt

Then I run the following command:

find E:/test1 -maxdepth 3 -type f -printf "%s|%p\n" > E:/test2/output.txt

But I got the following error:

find: ‘E:/test1/αβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγαβγα\316’: No such file or directory

As far as I understand, it cannot be fixed.

lyrically wicked
  • 1,185
  • 12
  • 26
  • 2
    I think you have not posted the **full** error message. Error messages usually start with the name of the program, followed by a colon, for instance `bash:` or `find:`. We don't even know from your posting, whether bash or find is producing the error message! – user1934428 Aug 15 '23 at 07:35
  • 4
    Your use of `find` should not care about spaces in file or directory names. A possible reason, that has nothing to do with spaces, would be that some files or directories appear/disappear during the search. Does the content of `path/to/directory` change during the execution of `find`? – Renaud Pacalet Aug 15 '23 at 08:21
  • @RenaudPacalet: Yes, you are right. It turns out that the problem is not in whitespaces in directory names. I don't know whether I should edit my post or ask a new question. In short, the `my directory 1` part in an error message was not related to a directory, it was a filename which was _cut_ to some number of chars. So, for example, a file named `file00...00abcd.txt` (the name contains more than 160 chars) gives the `find: 'path/to/directory/a/fileaa...ab': No such file or directory` error message. I have no idea why. – lyrically wicked Aug 15 '23 at 08:55
  • You should [accept an answer](https://stackoverflow.com/help/someone-answers) to the question you asked (assuming you got an answer to it) and then ask a new question. – Ed Morton Aug 15 '23 at 20:29
  • Haha! At the time I mentioned Windows in my answer, I didn't know you're doing this on Windows. I just mentioned it as an aside, like, hey, though we are using a wrapper for the `nftw` function, this works on Windows, too. – Kaz Aug 17 '23 at 15:45
  • I appreciate the useful links in your question; so this is related to the setting of `NAME_MAX`; I did not even know that we do have a **byte** limit for the length of a file name. I checked the [POSIX standard](https://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html) and found that a POSIX-compliant implementation could even set the limit to as low as 14 bytes!!! Considering this, the 255 bytes set by Cygwin appears generous... – user1934428 Aug 18 '23 at 09:08

2 Answers2

4

I think you can do it like this

find path/to/directory -type f -exec stat -c "%s|%n" {} \;
  • -exec stat -c "%s|%n" {} \; executes the stat command for each file found and %s represents the file size in bytes, and %n represents the file name, the | character is used as a separator between the size and the path!

you can also try this:

find path/to/directory -maxdepth 3 -type f -exec sh -c 'size=$(stat -c "%s" "$1"); echo "$size|$1"' sh {} \; > path/to/output.txt
Freeman
  • 9,464
  • 7
  • 35
  • 58
  • Now I get the error messages of the form `stat: cannot stat 'path/to/directory/longname000...012': No such file or directory` (here `longname000...01` is the long name of a file, cut to the initial 160–190 chars, i.e. the original name is `longname000...01234abcd.txt`). I do not know what causes the problem. Maybe I should ask another question because the problem is not in the `find` command. Aside from this issue, the command produces the correct output. – lyrically wicked Aug 15 '23 at 09:12
  • @lyricallywicked Yes, the error messages you're getting are because the file names are too long for the `stat` command and It's probably not a problem with the `find` command itself, this happens when the file names exceed the maximum length allowed by your operating system,so to fix it, you can try shortening the file names or moving them to a directory with a shorter path! – Freeman Aug 15 '23 at 09:22
0

Using nftw function via TXR Lisp:

$ txr sizepath.tl /usr/share/man
2408|/usr/share/man/vi/man1/help2man.1.gz
2246|/usr/share/man/ru/man8/groupmems.8.gz
2451|/usr/share/man/ru/man8/groupmod.8.gz
[...]

The nftw POSIX C library function has a FTW_CHDIR flag specified in the standard which causes the function to chdir into every directory it traverses.

I believe that when FTW_CHDIR is used, nftw is not thwarted by long paths, since internally, it's using short relative paths to do its descent.

In TXR Lisp, this flag value appears as the variable ftw-chdir. The sizepath.tl program is:

(ftw *args*
     (lambda (path type stat . rest)
       (when (eql type ftw-f)
         (put-line `@{stat.size}|@path`)))
     ftw-chdir)

Regardless of whether you use ftw-chdir, the path argument to the callback function specifies the full path as if chdir calls were not taking place; so that is to say, if the starting points are relative paths, then path will be a relative path from the original directory. If we had to open the file, we would have to use (base-name path), but we don't: ftw gives us a stat argument which is the stat structure; we just pull the size field from that.

Although nftw is a POSIX function, this works in the Windows version of TXR, which carries an extensive POSIX run-time library.

At the Windows 10 cmd.exe prompt:

C:\Users\kaz>copy con sizepath.tl
(ftw *args*
     (lambda (path type stat . rest)
       (when (eql type ftw-f)
         (put-line `@{stat.size}|@path`)))
     ftw-chdir)
^Z
        1 file(s) copied.

C:\Users\kaz>txr sizepath C:\Users\kaz\AppData\Local\Microsoft
51959|C:\Users\kaz\AppData\Local\Microsoft/CLR_v4.0/ngen.log
2339|C:\Users\kaz\AppData\Local\Microsoft/CLR_v4.0/UsageLogs/AsusJWTDecoder.exe.log
1327|C:\Users\kaz\AppData\Local\Microsoft/CLR_v4.0/UsageLogs/AsusLiveUpdateToast.exe.log
[...]

Native paths with drive letter names and all.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • The TXR installer for Windows requires administrative privileges to run. I cannot use such programs. – lyrically wicked Aug 17 '23 at 03:33
  • @lyricallywicked That looks fixable: https://stackoverflow.com/questions/45467280/nsis-installer-without-admin-privileges Certain installers ask the user, do you want to install this for just the current user, or for all users? I wonder, is that that what they are doing: for all users, they request privilege, and then can put it into a directory like C:\Program Files. – Kaz Aug 17 '23 at 04:33
  • @lyricallywicked Thanks for bringing this important issue to my attention. The installer will request the privilege level `highest` from now on, so that for admin users it will do the escalation, but for non-admin users it won't. In unprivileged mode, the default installation path will be altered to the user's local AppData directory, and the local PATH will be manipulated in the HKCU hive of the registry rather than HKLM. I might re-issue the installers for TXR 291 with these changes or wait until 292. – Kaz Aug 17 '23 at 05:50
  • @lyricallywicked I prepared a re-spin of the 291 installers: https://www.kylheku.com/txr-downloads/txr-291-win-update/ Let me know if there are any problems installing this as an unprivileged user on your machine. Cheers. – Kaz Aug 17 '23 at 06:41
  • Does the installer use the User Account Control in _any_ form? Does it require the right to write to Registry? – lyrically wicked Aug 17 '23 at 07:04
  • @lyricallywicked I don't know the exact API's that the NSIS-generated installer uses for requesting privilege. The new installer should no longer be requesting admin privilege for non-admin users. It will not have the right to write to privileged parts of the registry; it will put the program into the user-local PATH variable, not the system one. That PATH is in the HKEY_CURRENT_USER registry hive that the user can write to. An unprivileged user should not be seeing any User Account Control popup any more. – Kaz Aug 17 '23 at 07:22
  • Will an admin user see any User Account Control popup? If yes, that would mean that the installer does use the UAC. If no, I do not understand what you meant by "for admin users it will do the escalation." – lyrically wicked Aug 17 '23 at 07:50
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/254944/discussion-between-kaz-and-lyrically-wicked). – Kaz Aug 17 '23 at 13:16