4

How much size do I need to allocate for /proc/%u/fd/%u?

In strace code they allocated char path[sizeof("/proc/%u/fd/%u") + 2 * sizeof(int)*3];

I didn't understand the calc , how did they calc this size?

MicrosoctCprog
  • 460
  • 1
  • 3
  • 23
  • My guess: `MAX_PATH`. Unless you're strapped for space why take chances? I also hope you're using `snprintf` and not `sprintf`. – tadman Aug 18 '20 at 08:18
  • `3*sizeof(type) > log10(MAX_TYPE)` for all c types. – spectras Aug 18 '20 at 08:26
  • Well, since the pid will be some value between 1 and `/proc/sys/kernel/pid_max` and the file descriptor will be some value much lower than `/proc/sys/fs/file-max`, you could calculate the minimal safe buffer size by taking `floor(log10(x)+1)` for both of these. That being said, using `MAX_PATH` is certainly the easier option. – Felix G Aug 18 '20 at 08:29
  • @tadman: MAX_PATH doesn't really mean something on Linux. That macro exists mostly for historic reasons and to be source compatible with other (lesser?) operating systems. But in Linux (and the modern BSDs) it's trivial to create filesystem trees that vastly exeed the limits of MAX_PATH. – datenwolf Aug 18 '20 at 09:09
  • @datenwolf Unless they start using 2048 bit PID and FD numbers, it'll be fine. This isn't for general paths, but a very specific case. – tadman Aug 18 '20 at 09:10
  • @tadman: Indeed. The space saving yet future proof method would be `char proc_fd_path[sizeof("/proc//fd/") + (((sizeof(pid_t) + sizeof(int)) + 1)*10*CHAR_BIT - 2)/33 + 1];`; I'll leave it to the reader to work out, why this will *always* reserve enough memory for anything `/proc/%u/fd/%u` could throw at you. Hint log2(10) ≈ 3.3 – datenwolf Aug 18 '20 at 09:48

4 Answers4

4

That may seem a rather bizarre way to do it, but it works (see below). They're assuming that you'll need no more than 2 + 3n characters for both the process ID and file descriptor, where n is the number of bytes in an integer (2 because the %u will be replaced, 3n because of the sizeof component).

It does actually work, as per the following chars/int values (for 8-bit char), formula results, and the maximum unsigned int that will come from it:

chars/int  sizeof(int) * 3 + 2       max value
---------  -------------------  --------------------
        1                    5                   255 (3/5 needed)
        2                    8                 65535 (5/8 needed)
        4                   14            4294967296 (10/14)
        8                   26  18446744073709551616 (20/26)
       16                   50           3.3 * 10^38 (39/50)

You can see that the storage allocated keeps up with the storage required, to the point where you could just as easily use:

char path[sizeof("/proc/%u/fd/%u") + 2 * sizeof(int) * 3 - 2];
//                                              add this ^^^

And it will also work for char sizes beyond eight bits as well since doubling the bits in a value at most doubles the number of decimal digits (the table shows this with the needed size: 3, 5, 10, 20, 39, ...). That, combined with the fact that the int doubles in size as well, means there will always be enough space.


You may think that, for safety and portability, you could actually get the maximum PID and process file descriptor respectively from /proc/sys/kernel/pid_max and /proc/<pid>/limits (programatically) and use that to dynamically allocate enough space to hold the information:

pax:~> cat /proc/sys/kernel/pid_max
32768
pax:~> cat /proc/self/limits | awk '$1$2$3=="Maxopenfiles"{print $5}'
4096

Hence the first %u would need 5 characters and the second four characters.


Or, you could just allocate 20 characters for each on the likely case that you won't see 66-bit PIDs or FDs anytime in the near future :-) You could still check against the procfs files, if only to exit gracefully if they were too large.


But, to be honest, both of those are overkill, since you may already have a thing that's meant to provide the limit on file paths, PATH_MAX (see limits.h), for Linux. It's probably okay to use that and wear the minimal wastage. And, if you don't have that, you could either get the value from maximum length with a call to pathconf(). Or choose a sensible default (say, 60 characters), then just use snprintf to detect if the actual path was longer than your buffer, and exit gracefully.

If you really wanted to use the optimal method for minimal storage (i.e., you think a PATH_MAX of 4K is too much wastage) then, by all means, use the strace method, but with two subtracted as per my earlier comment(a).


(a) If you're going to worry about 4K here and there, you may as well do it fully :-)

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Many distributions have set `pid_max` to `131072` so they're getting bigger. It's kind of risky to assume something, and you'll blow through way more than ~4KB of memory digging through those `/proc` files for answers. – tadman Aug 18 '20 at 08:48
  • 2
    Reading `/proc/sys/kernel/pid_max` and `/proc/self/limits` ..just to get buffer size for `proc/pid/` would be just as bizarre ;-) Use `MAX_PATH` or even just hard-coding it to something like 256 is a better option. – P.P Aug 18 '20 at 08:50
  • @P.P: yes, I quickly came to that same opinion while editing the answer :-) – paxdiablo Aug 18 '20 at 08:52
  • why to use `MAX_PATH` (4kb) if we can use much less space safely? – MicrosoctCprog Aug 18 '20 at 08:56
  • @MicrosoctCprog: because finding out the actual minimum is a serious pain, when the 4K limit is readily available. – paxdiablo Aug 18 '20 at 08:59
  • @paxdiablo No, it doesn't have to be readily available. https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html#tag_13_23_03_02 – Andrew Henle Aug 18 '20 at 11:24
  • @Andrew, it *is* readily available for Linux, which is the OS tag in the question. Other operating systems (such as those following POSIX as per your link), I couldn't comment on. – paxdiablo Aug 18 '20 at 12:32
  • @paxdiablo Well, that's a violation of the POSIX standard, which states "constants in the following list **shall be omitted** from the `` header on specific implementations where the corresponding value is equal to or greater than the stated minimum, but where the value can vary depending on the file to which it is applied." Since Linux supports filesystems with varying values, it shouldn't define `MAX_PATH` at all. – Andrew Henle Aug 18 '20 at 13:40
  • @Andrew, most Linux distros are not certified as POSIX compliant, though they comply with quite a bit (see https://en.wikipedia.org/wiki/POSIX). This may well be one area where they do not. However, keep in mind that `linux/limits.h` is *not* `limits.h` and `PATH_MAX` appears to not be in the latter (the former is for the kernel I believe, not the distro). Apologies for that oversight, I have updated answer to suit. – paxdiablo Aug 18 '20 at 14:02
2

The best answer is, replace this calculation for MAX_PATH (include <limits.h>) that makes your code to more cleaner and portable.

paramikoooo
  • 177
  • 2
  • 16
  • 1
    *... and portable*. First, it's `PATH_MAX`. [and `PATH_MAX` **does not have to be defined**](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html#tag_13_23_03_02): "A definition of one of the symbolic constants in the following list **shall be omitted** from the `` header on specific implementations where the corresponding value is equal to or greater than the stated minimum, but where the value can vary depending on the file to which it is applied. The actual value supported for a specific pathname shall be provided by the `pathconf()` function." – Andrew Henle Aug 18 '20 at 11:19
1

The easy answer is to just use PATH_MAX because in this case it's going to be fine.

The more complicated version is what strace does.

That strace code does seem to have the right idea. It seems like a fairly safe assumption that for any given byte (0..255) you'll need up to 3 decimal places to represent it.

That calculation presumes that a 4-byte int can always be represented by 3x the decimal places, or no more than 12 digits.

In actuality you'd need just 10, so there's a safety margin here, and room for future-proofing if int ever gets bigger for some reason.

Update

Here's a quick demonstration of the strlen vs. sizeof issue:

#include <stdlib.h>
#include <stdio.h>

int main() {
  // Character array
  char tmpl[] = "/proc/%u/fd/%u";

  printf("sizeof(char[])=%lu\n", sizeof(tmpl));

  // Character pointer
  char* ptr = "/proc/%u/fd/%u";

  printf("sizeof(char*)=%lu\n", sizeof(ptr));

  // String literal without variable
  printf("sizeof(\"...\")=%lu\n", sizeof("/proc/%u/fd/%u"));

  return 0;
}

For me on a 64-bit system this produces the results:

sizeof(char[])=15
sizeof(char*)=8
sizeof("...")=15

Where sizeof("...any string...") is always going to be the same thing, the size of a char pointer, or in other words 8 on a 64-bit system.

The tmpl[] definition defines an array which while largely interchangeable with pointers due to array to pointer decay, does have size information available to the compiler in the scope in which it is defined.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • 5
    also `sizeof("/proc/%u/fd/%u")` is not `strlen("/proc/%u/fd/%u")`, so the code is wrong – bruno Aug 18 '20 at 08:24
  • @bruno Ah, good catch there. That looks...bad. For large values that would overflow pretty badly, especially on a 32-bit machine. – tadman Aug 18 '20 at 08:25
  • warning as you say `2 * sizeof(int)*3]` is wrong too – bruno Aug 18 '20 at 08:32
  • @tadman. If we do the math, we find that the allocation is **39** bytes (32bits system). The size needed to store `/proc/1000000000/fd/1000000000` is **31** bytes. So I don't think that their is a real risk here, only some memory wasting. – Mathieu Aug 18 '20 at 08:35
  • 5
    @bruno actually literal strings has type `const char [N]` so the size is correct in this case. Using `strlen` would make the array a VLA with the size calculated at runtime instead of compile time – phuclv Aug 18 '20 at 08:37
  • @phuclv ah yes, I supposed a literal string is s `const char*` but you are right, thank you – bruno Aug 18 '20 at 08:41
  • 2
    @tadman `sizeof` works correctly on a string literal. Try it and see. In your example you used it on a `char*`, which is not what the strace code does. – interjay Aug 18 '20 at 08:55
  • @interjay Indeed. Well, that changes everything. – tadman Aug 18 '20 at 08:57
  • @phuclv The type of string literal is actually `char [N]` (there's no `const` in C, unlike C++). – P.P Aug 18 '20 at 08:59
  • 1
    @P.P `const` has been [part of C](https://en.cppreference.com/w/c/language/const) since 1989. It's just not as pervasively used. – tadman Aug 18 '20 at 09:02
  • 2
    @tadman I was referring to the type of string literals, not the `const` keyword as such. Just to be *really* clear: the type of string literal in C is not `const char [N]` but `char [N]`. – P.P Aug 18 '20 at 09:22
  • @P.P Ah, yes, that's what I mean. It exists but it's not effectively used, and right, string literals aren't `const` in C even if they probably should be. C++ at least gets that right. – tadman Aug 18 '20 at 09:26
  • @tadman why did you close this issue https://github.com/strace/strace/issues/148 ? they not need to change `sizeof("/proc/%u/fd/%u")` to `strlen("/proc/%u/fd/%u")` ?? – MicrosoctCprog Aug 18 '20 at 09:28
  • *he easy answer is to just use `PATH_MAX`* No, because `PATH_MAX` does not have to exist. See https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html#tag_13_23_03_02 – Andrew Henle Aug 18 '20 at 11:23
  • 1
    @tadman `sizeof` returns a `size_t` which [must be printed using `%zu`](https://stackoverflow.com/q/940087/995714) or UB will happen – phuclv Aug 18 '20 at 12:47
1

If you're using Linux, asprintf(3) will allocate enough memory to hold the final string for you so you don't have to mess with variable-length arrays or working out the needed length ahead of time:

char *path = NULL;
if (asprintf(&path, "/proc/%u/fd/%u", pid, fd) < 0) {
    // Handle error
}
// Use path
free(path); // Don't forget to free it when done.


You can also use snprintf(3) with a 0 length buffer to get the needed length, and then call it again with an appropriately sized buffer:

int len = snprintf(NULL, 0, "/proc/%u/fd/%u", pid, fd);
if (len < 0) {
    // Handle error
}
char buf[len + 1]; // Need to add one for the trailing nul
if (snprintf(buf, len + 1, "/proc/%u/fd/%u", pid, fd) != len) {
    // Handle error
}
Shawn
  • 47,241
  • 3
  • 26
  • 60