2

I have an application that builds file path names through a series of string concatenations using pieces of text to create a complete file path name.

The question is whether an approach to handle concatenating a small but arbitrary number of strings of text together depends on Undefined Behavior for success.

Is the order of evaluation of a series of nested functions guaranteed or not?

I found this question Nested function calls order of evaluation however it seems to be more about multiple functions in the argument list rather than a sequence of nesting functions.

Please excuse the names in the following code samples. It is congruent with the rest of the source code and I am testing things out a bit first.

My first cut on the need to concatenate several strings was a function that looked like the following which would concatenate up to three text strings into a single string.

typedef wchar_t TCHAR;

TCHAR *RflCatFilePath(TCHAR *tszDest, int nDestLen, TCHAR *tszPath, TCHAR *tszPath2, TCHAR *tszFileName)
{
    if (tszDest && nDestLen > 0) {
        TCHAR *pDest = tszDest;
        TCHAR *pLast = tszDest;

        *pDest = 0;   // ensure empty string if no path data provided.

        if (tszPath) for (pDest = pLast; nDestLen > 0 && (*pDest++ = *tszPath++); nDestLen--) pLast = pDest;
        if (tszPath2) for (pDest = pLast; nDestLen > 0 && (*pDest++ = *tszPath2++); nDestLen--)  pLast = pDest;
        if (tszFileName) for (pDest = pLast; nDestLen > 0 && (*pDest++ = *tszFileName++); nDestLen--)  pLast = pDest;
    }

    return tszDest;
}

Then I ran into a case where I had four pieces of text to put together.

Thinking through this it seemed that most probably there would also be a case for five that would be uncovered shortly so I wondered if there was a different way for an arbitrary number of strings.

What I came up with is two functions as follows.

typedef wchar_t TCHAR;

typedef struct {
    TCHAR *pDest;
    TCHAR *pLast;
    int    destLen;
} RflCatStruct;

RflCatStruct RflCatFilePathX(const TCHAR *pPath, RflCatStruct x)
{
    TCHAR *pDest = x.pLast;
    if (pDest && pPath) for ( ; x.destLen > 0 && (*pDest++ = *pPath++); x.destLen--)  x.pLast = pDest;
    return x;
}

RflCatStruct RflCatFilePathY(TCHAR *buffDest, int nLen, const TCHAR *pPath)
{
    RflCatStruct  x = { 0 };

    TCHAR *pDest = x.pDest = buffDest;
    x.pLast = buffDest;
    x.destLen = nLen;

    if (buffDest && nLen > 0) {   // ensure there is room for at least one character.
        *pDest = 0;   // ensure empty string if no path data provided.
        if (pPath) for (pDest = x.pLast; x.destLen > 0 && (*pDest++ = *pPath++); x.destLen--)  x.pLast = pDest;
    }
    return x;
}

Examples of using these two functions is as follows. This code with the two functions appears to work fine with Visual Studio 2013.

TCHAR buffDest[512] = { 0 };
TCHAR *pPath = L"C:\\flashdisk\\ncr\\database";
TCHAR *pPath2 = L"\\";
TCHAR *pFilename = L"filename.ext";

RflCatFilePathX(pFilename, RflCatFilePathX(pPath2, RflCatFilePathY(buffDest, 512, pPath)));
printf("dest t = \"%S\"\n", buffDest);


printf("dest t = \"%S\"\n", RflCatFilePathX(pFilename, RflCatFilePathX(pPath2, RflCatFilePathY(buffDest, 512, pFilename))).pDest);


RflCatStruct  dStr = RflCatFilePathX(pPath2, RflCatFilePathY(buffDest, 512, pPath));
//   other stuff then
printf("dest t = \"%S\"\n", RflCatFilePathX(pFilename, dStr).pDest);
Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
  • 3
    The evaluation order of parameters to the *same* function is not specified. So if you pass *more than one* expression having a side effect (such as a function call), then you might have troubles. But in your case you are passing one at most. – Eugene Sh. Feb 15 '18 at 22:30
  • The only problem I see in the first code of block is that if the `nDestLen` runs to 0, you don't set the 0-terminating byte. I'd do at the end `if(nDestLen == 0) *pDest = 0;` to make sure that the destination stores a valid string. – Pablo Feb 15 '18 at 22:35
  • 1
    Is there a reason why you don't use `strcat` or `snprintf`? That would make the code much smaller: `snprintf(tszDest, nDestLen, "%ls%ls%ls", tszPath ? tszPath : "", tszPath2 ? tszPath2 : "", tszFileName ? tszFileName : "");` – Pablo Feb 15 '18 at 22:39
  • @Pablo if the destination buffer length is zero then it is assumed there is no room to put anything including an end of string terminator. This follows what is standard for `strncpy()` and `strncat()`. – Richard Chambers Feb 16 '18 at 01:33
  • @Pablo with the secure versions of the string manipulation functions, `strcat_s()` and others, the destination buffer length is expected and the secure versions of the functions is what is expected these days. At least Visual Studio complains about every `strcat()` and `sprintf()` so in this particular application, I have a preprocessor define to turn off that check. One of the reasons for this approach is to have a general purpose string copy function that could be used across functions that are being used to build text strings. This approach has a functional programming vibe I like as well. – Richard Chambers Feb 16 '18 at 01:41
  • @RichardChambers I didn't say when the buffer length is 0, I said when the `nDestLen--` gets to 0 in the loop. But if you are trying to have the same behaviour as `strncat`, then you don't need to set the 0-terminating byte. But again, why not use `strncat`? I find all these code in one line is hard to read and in my opinion you are not gaining anything from that. – Pablo Feb 16 '18 at 02:26

1 Answers1

1

Arguments to a function call are completely evaluated before the function is invoked. So the calls to RflCatFilePath* will be evaluated in the expected order. (This is guaranteed by §6.5.2.2/10: "There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call.")

As indicated in a comment, the snprintf function is likely to be a better choice for this problem. (asprintf would be even better, and there is a freely available shim for it which works on Windows.) The only problem with snprintf is that you may have to call it twice. It always returns the number of bytes which would have been stored in the buffer had there been enough space, so if the return value is not less than the size of the buffer, you will need to allocate a larger buffer (whose size you now know) and call snprintf again.

asprintf does that for you, but it is a BSD/Gnu extension to the standard library.

In the case of concatenating filepaths, there is a maximum string length supported by the operating system/file system, and you should be able to find out what it is (although it might require OS-specific calls on non-Posix systems). So it might well be reasonable to simply return an error indication if the concatenation does not fit into a 512-byte buffer.

Just for fun, I include a recursive varargs concatenator:

#include <stdarg.h>
#include <stdlib.h>
#include <string.h>

static char* concat_helper(size_t accum, char* chunk, va_list ap) {
  if (chunk) {
    size_t chunklen = strlen(chunk);
    char* next_chunk = va_arg(ap, char*);
    char* retval = concat_helper(accum + chunklen, next_chunk, ap);
    memcpy(retval + accum, chunk, chunklen);
    return retval;
  } else {
    char* retval = malloc(accum + 1);
    retval[accum] = 0;
    return retval;
  }
}
char* concat_list(char* chunk, ...) {
    va_list ap;
    va_start(ap, chunk);
    char* retval = concat_helper(0, chunk, ap);
    va_end(ap);
    return retval;
}

Since concat_list is a varargs function, you need to supply (char*)NULL at the end of the arguments. On the other hand, you don't need to repeat the function name for each new argument. So an example call might be:

concat_list(pPath, pPath2, pFilename, (char*)0);

(I suppose you need a wchar_t* version but the changes should be obvious. Watch out for the malloc.) For production purposes, the recursion should probably be replaced by an iterative version which traverses the argument list twice (see va_copy) but I've always been fond of the "there-and-back" recursion pattern.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Shims for an asprintf-like function here: https://stackoverflow.com/a/30687023/1566221. But I think the Windows version is no longer necessary. – rici Feb 16 '18 at 02:49
  • Thank you for the C Standard reference. I appreciate you taking the time for the variable arguments version of a string concatenation function. I have used that approach with some other things, including the use of an end of argument list value such as a NULL pointer, but did not think to consider it for this problem. I don't really want to go the route of a `malloc()` and the housekeeping needed as this will mostly be on the stack and temporary. I am not persuaded that `snprintf()` is a better choice, merely a different choice. – Richard Chambers Feb 16 '18 at 02:50
  • 1
    @richard: if you don't plan on allocating memory, then snprintf is a lot less typing, but of course tastes differ. The varargs thing was a C&P so I didn't actually spend much time on it. If I had, I might have written a version which takes a buffer address and length instead of doing a malloc; the modification would be trivial. But it's a cheeky solution; it comes from the desk drawer of "fun recursive stuff" which is more didactic than practical. – rici Feb 16 '18 at 02:58
  • I can understand "fun recursive stuff". The approach I worked out would come under what I would call "fun functional looking stuff". Lol My first thought was that this would seem to be a perfect place for currying except that C doesn't really let you do that. I have a large number of places where `sprintf()` is used for creating logs and it works great for that except for all the warnings generated so I've started converting to `snprintf()`. But it is a lot of code. – Richard Chambers Feb 16 '18 at 03:17