-4

I'd like to write a function like this:

int validate_file_name(char *filename)
{
    //...
}

which will:

  • return 1 if there was no \0 character in the filename,
  • 0 otherwise.

I thought it may be achieved using a simple for(size_t i = 0; i < strlen(filename); i++), but I don't know how to determine how much characters I've got to check?

I can't use strlen() because it will terminate on the first occurrence of a \0 character.

How should I approach this problem?


Clarification:

I am trying to apply these guidelines to a filename I receive. If you should avoid putting a \0 in a filename, how could you validate this if you've got no size parameter.

Moreover, there are strings with multiple \0 characters, like here: http://www.gnu.org/software/libc/manual/html_mono/libc.html#Argz-and-Envz-Vectors. Still, I had no idea that it is impossible to determine their length if it is not explicitly provided.


Conclusion:

There is no way you can determine the length of string which is not NULL-terminated. Unless you know the length of course or you deploy some dirty hacks: Checking if a pointer is allocated memory or not.

Community
  • 1
  • 1
Mateusz Piotrowski
  • 8,029
  • 10
  • 53
  • 79
  • need size parameter. – BLUEPIXY May 30 '15 at 21:14
  • I am implementing a header file and I cannot change anything there. So there's no way I could provide a size parameter. – Mateusz Piotrowski May 30 '15 at 21:16
  • Don't use `strlen()` like that, it iterates over the characters to count them, so at each iteration you iterate through the whole filename. – Iharob Al Asimi May 30 '15 at 21:16
  • 1
    Think of it: how could you possibly find the end of the filename if it can contain null terminators? You would need either a length parameter, or a different sentinel value that indicates the end of the string. – juanchopanza May 30 '15 at 21:18
  • 2
    Yeah, I agree with you guys, But the problem here is: 1. the string IS NULL terminated. 2. It might have NULL characters in it, since a user can provide such an evil filename. 3. I thought there might be some way around by detecting the size of a memory block allocated for this string. – Mateusz Piotrowski May 30 '15 at 21:25
  • 1
    "Each argz vector is represented by a pointer to the first element, of type `char *,` ***and a size, of type size_t***..." – juanchopanza May 30 '15 at 21:42
  • @juanchopanza that is true. But still, I've got idea how to validate a filename without knowing its length. According to the answers I received it is not possible. Thanka a lot @juanchopanza! You've been really patient :) – Mateusz Piotrowski May 30 '15 at 21:51

3 Answers3

4

You are trying to solve a problem that does not need to be solved.

A file name is a string. In C, a "string" is by definition "a contiguous sequence of characters terminated by and including the first null character".

It is impossible to have a string or a file name with a null character embedded in it.

It's possible to have a sequence of characters with an embedded null character. For example:

char buf[] = "foo\0bar.txt";

buf is an array of 12 characters; the characters at positions 3 and 11 are both null characters. If you treat buf as a string, for example by calling

fopen(buf, "r")

it will be treated as a string with a length of 3 (the length of a string does not include the terminating null character).

If you're working with character arrays that may or may not contain strings, then it makes sense to do what you're asking. You would need to keep track of the size of the buffer separately from the address of the initial character, either by passing an additional argument or by wrapping the pointer and the length in a structure.

But if you're dealing with file names, it's almost certainly best just to deal with strings and assume that whatever char* value is passed to your function points to a valid string. If it doesn't (if there is no null character anywhere in the array), that's the caller's fault, and not something you can reasonably check.

(Incidentally, Unix/Linux file systems explicitly forbid null characters in file names. The / character is also forbidden, because it's used as a directory name delimiter. Windows file systems have even stricter rules.)

One last point: NULL is (a macro that expands to) a null pointer constant. Please don't use the term NULL to refer to the null character '\0'.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
2

The answer is that you can't write a function that does that if you don't know the length of the string.

To determine the length of the string strlen() searches for the '\0' character which if is not present will cause undefined behavior.

If you knew the length of the string then,

for (int i = 0 ; i < length ; ++i)
 {
    if (string[i] != '\0')
        continue;
    return 1;
 }
return 0;

would work, if you don't know the length of the string then the condition would be

for (int i = 0 ; string[i] != '\0' ; ++i)

which obviously means that then searching for the '\0' makes no sense because it's presence is what makes all other string related functions to work properly.

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
2

If the string is not NULL-terminated, what else it is terminated by? And if you don't know that, what is it length? If you know the answer to these problems, you know the answer to your question.

Gerard van Helden
  • 1,601
  • 10
  • 13
  • It IS NULL terminated. I was just wondering, what if a user gives me a wrong filename. – Mateusz Piotrowski May 30 '15 at 21:23
  • They'd be pretty sharp to give you a filename containing a '\0' :) Aren't you just being cautious for the sake of being cautious here ...? – Gerard van Helden May 30 '15 at 21:26
  • 1
    @MateuszPiotrowski If it can contain `\0` then it is not a null-terminated string unless you consider the first `\0` to be the end of the string. – juanchopanza May 30 '15 at 21:28
  • 2
    Remember that `NULL` is a null *pointer* constant. Please don't use the term `NULL` to refer to the null *character* `'\0'`. (I know you copied the usage from the question.) – Keith Thompson May 30 '15 at 21:31
  • @GerardvanHelden In a way... 1. I've heared about strings with multiple `\0` here in the GNU C Documentation: http://www.gnu.org/software/libc/manual/html_mono/libc.html#Argz-and-Envz-Vectors. 2. I am trying to validate a filename according to this https://stackoverflow.com/questions/457994/what-characters-should-be-restricted-from-a-unix-file-name/458001#458001. They say that `\0` and `/` should not be a part of a file name, so I thought it might be a good idea to validate a filename a user tries to use. – Mateusz Piotrowski May 30 '15 at 21:32
  • @KeithThompson Let's not get into this debate ;) Because they really aren't that different, are they? :) – Gerard van Helden May 30 '15 at 21:35
  • 1
    @GerardvanHelden: Yes, they are almost completely different. – Keith Thompson May 30 '15 at 21:37
  • 1
    @MateuszPiotrowski: The GNU C documentation doesn't talk about strings with multiple `'\0'` characters; it talks about multiple strings, each of which is terminated by a `'\0'` character. Quite simply, a string cannot contain an embedded null character. If it did, it wouldn't be a string. – Keith Thompson May 30 '15 at 21:39
  • 1
    @KeithThompson conceptually they are but in practice they aren't. But I really didn't want to get into the debate. It's just semantics. I'm on your side though if you really want to start it ;) :P – Gerard van Helden May 30 '15 at 21:43
  • 1
    @GerardvanHelden: Both conceptually and in practice, null characters and null pointers are different. Their only similarity is that they can both be written in C source code as the constant `0`. I'm afraid I don't know what you're talking about. `char c = NULL;` can fail to compile if `NULL` is defined as `((void*)0)`. – Keith Thompson May 30 '15 at 21:54
  • 1
    @KeithThompson what I mean is that a pointer in fact is an integer, and a char is an integer internally as well. The fact that the compiler would warn us is just to make sure that we know what we're doing. But still, I agree on your case that we should be clear about things we're talking about, and yes, a '\0' character is something different than a NULL pointer, so I do agree with you. – Gerard van Helden May 30 '15 at 21:59
  • 1
    @GerardvanHelden: Ah, I see the source of your confusion. Pointers are not integers. If you think they are, try adding, multiplying, or dividing two pointer values. – Keith Thompson May 30 '15 at 22:03
  • I know, but still that's the compiler talking. And rightly so :) – Gerard van Helden May 30 '15 at 22:38
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/79225/discussion-between-keith-thompson-and-gerard-van-helden). – Keith Thompson May 30 '15 at 22:44