strtok_r save state behaviour

Question

The correct way to use strtok_r is as follows:

char* str = strdup(string);
char* save;
char* ptr = strtok_r(str, delim, &save);
while(ptr) {
  puts(ptr);
  ptr = strtok_r(NULL, delim, &save);
}

When trying to inspect what actually is stored in save, I found it is just the rest of the unparsed string. So I tried to make the second call look like the first and wrote a wrapper as follows.

char* as_tokens(char** str, const char* const delim) {
  return strtok_r(NULL, delim, str);
}

This can be used like below which is much less verbose. We don't have to differentiate between first call and rest.

char* str = strdup(string);
char* ptr;
while(ptr = as_tokens(&str, delim))
  puts(ptr);

Are there any downsides in this approach? Am I causing any undefined behavior? I tried some edge cases and both approaches work similarly.

Online Compiler: https://wandbox.org/permlink/rkGiwXOUtzqrbMpP

P.S. Ignoring memory leaks for brevity.

Update

There already exists a function almost similar to my as_tokens: strsep. It differs in the case when there are consecutive delimiters. strsep returns an empty string while as_tokens (i.e strtok_r) treats them as one.

I believe the way the state is saved is not specified in the `strtok_r` documentation, so relying on it would be risky. — Eugene Sh., Feb 20 '19 at 20:53
Note, too, that *no* part of the string is stored in `save`. `save` is a pointer. However, it would not be at all surprising for `save` to *point to* the tail of the string. It is important to maintain and be mindful of the distinction between a pointer and the thing to which it points. — John Bollinger, Feb 20 '19 at 21:04
The designers could have specified the implementation to work the way you want but chose not to. You are violating the requirement that `saveptr` be unmodified from the previous call when `str` is NULL. — stark, Feb 20 '19 at 21:19
@stark Where exactly am I modifying saveptr from the previous call? — balki, Feb 20 '19 at 21:22
The requirement violated is actually *"On the first call to strtok_r(), str should point to the string to be parsed"*, but I guess the question here is exactly "Can it be violated?" I think the general answer would be "No", as nothing is allowing it. — Eugene Sh., Feb 20 '19 at 21:23
Theoretically, an implementation could index the entire string during the first call (when `str` is not `NULL`), and `save` would point to the indexing data rather than just the tail of the string. — Barmar, Feb 20 '19 at 21:37
@JohnBollinger No, he makes every call a subsequent call, not a first call. — Barmar, Feb 20 '19 at 21:37
@Barmar I don't think creating an index would be possible as there is no way to free it. — balki, Feb 20 '19 at 21:41
@balki Isn't that true in general for whatever `save` points to? I don't see anything in the spec that says it must share memory with the original string, although that's the common implementation. — Barmar, Feb 20 '19 at 21:46
It's easier and safer to write your own implementation of `strtok_r` that will 100% behave this way then to depend on uncertain data. — KamilCuk, Feb 20 '19 at 21:49
@KamilCuk Or simply copy the code from glibc, since it's open source. — Barmar, Feb 20 '19 at 21:50
Also see the BSD [strsep()](http://man7.org/linux/man-pages/man3/strsep.3.html) function. — Shawn, Feb 21 '19 at 09:34

John Bollinger · Accepted Answer · 2019-02-20T21:57:30.270

Are there any downsides in this approach?

Yes, it loses the original value of str, making it impossible (in this case) to free it. You therefore have a memory leak. That could be solved by keeping a separate copy of the pointer, but that boils down to very nearly the same thing as your first code.

Additionally, as was observed in comments, it does not comply with the specifications of strtok_r in that the behavior of a call to strtok_r with the first argument NULL is defined only in the context of a previous call to strtok_r that provided the value to which the third argument points.

Also, it departs from idiomatic, well-understood use of strtok_r, even going so far as to hide it in a different function. The normal idiom is not onerous, and it is well known and understood. Being clever about it makes your code a bit harder to maintain.

Am I causing any undefined behavior?

Yes, in the sense of "behavior that is not defined", as opposed to behavior that is explicitly called out as undefined. But the relevant standards attribute the same significance to those alternatives. See above.

strtok_r save state behaviour

1 Answers1