7

(This is not so much a problem as an exercise in pedantry, so here goes.)

I've made a nice little program that is native to my linux OS, but I'm thinking it's useful enough to exist on my Windows machine too. Thus, I'd like to access Windows' environment variables, and MSDN cites an example like this:

const DWORD buff_size = 50;
LPTSTR buff = new TCHAR[buff_size];

const DWORD var_size = GetEnvironmentVariable("HOME",buff,buff_size);

if (var_size==0) { /* fine, some failure or no HOME */ }
else if (var_size>buff_size) {

    // OK, so 50 isn't big enough.
    if (buff) delete [] buff;
    buff = new TCHAR[var_size];
    
    const DWORD new_size = GetEnvironmentVariable("HOME",buff,var_size);

    if (new_size==0 || new_size>var_size) { /* *Sigh* */ }
    else { /* great, we're done */ }
}
else { /* in one go! */ }

This is not nearly as nice (to me) as using getenv and just checking for a null pointer. I'd also prefer not to dynamically allocate memory since I'm just trying to make the program run on Windows as well as on my linux OS, which means that this MS code has to play nicely with nix code. More specifically:

template <class T> // let the compiler sort out between char* and TCHAR*
inline bool get_home(T& val) { // return true if OK, false otherwise
#if defined (__linux) || (__unix)
    val = getenv("HOME");
    if (val) return true;
    else return false;
#elif defined (WINDOWS) || defined (_WIN32) || defined (WIN32)
    // something like the MS Code above
#else
    // probably I'll just return false here.
#endif
}

So, I'd have to allocate on the heap universally or do a #ifdef in the calling functions to free the memory. Not very pretty.

Of course, I could have just allocated 'buff' on the stack in the first place, but then I'd have to create a new TCHAR[] if 'buff_size' was not large enough on my first call to GetEnvironmentVariable. Better, but what if I was a pedant and didn't want to go around creating superfluous arrays? Any ideas on something more aesthetically pleasing?

I'm not that knowledgeable, so would anyone begrudge me deliberately forcing GetEnvironmentVariable to fail in order to get a string size? Does anyone see a problem with:

const DWORD buff_size = GetEnvironmentVariable("HOME",0,0);
TCHAR buff[buff_size];
const DWORD ret = GetEnvironmentVariable("HOME",buff,buff_size);
// ...

Any other ideas or any suggestions? (Or corrections to glaring mistakes?)

UPDATE: Lots of useful information below. I think the best bet for what I'm trying to do is to use a static char[] like:

inline const char* get_home(void) { // inline not required, but what the hell.
#if defined (__linux) || (__unix)
    return getenv("HOME");
#elif defined (WINDOWS) || defined (WIN32) || defined (_WIN32)
    static char buff[MAX_PATH];
    const DWORD ret = GetEnvironmentVariableA("USERPROFILE",buff,MAX_PATH);
    if (ret==0 || ret>MAX_PATH)
        return 0;
    else
        return buff;
 #else
        return 0;
 #endif
 }

Perhaps it's not the most elegant way of doing it, but it's probably the easiest way to sync up what I want to do between *nix and Windows. (I'll also worry about Unicode support later.)

Thank you to everybody who has helped.

Zorawar
  • 6,505
  • 2
  • 23
  • 41
  • In C++ you should return the value as a `std::basic_string`. That solves the memory management problem. However, please note that `%HOME%` is not set by default. You might be looking for `%USERPROFILE%`, ` %APPDATA%` or `%LOCALAPPDATA%`. – MSalters Nov 09 '10 at 12:55
  • I'm not familiar with Windows environment variables, so I defaulted to what I knew! Yeah, `%USERPROFILE%` is what I need, though. Returning the string value sounds like a good idea! I can just return an empty string as a fail indicator. I'll give it a go when I can be bothered to get off my laptop... – Zorawar Nov 09 '10 at 16:10
  • Over a decade later and I feel the same pain :( At the heavy risk of being labelled a pedant (do I get a linked in badge for that?), I think your check should be ret >= MAX_PATH as the API docs say: "If lpBuffer is not large enough to hold the data, the return value is the buffer size, in characters, required to hold the string and its terminating null character and the contents of lpBuffer are undefined.", so a MAX_PATH length would be a bug. I think. Sorry! I think there is also a theoretical risk of the value changing between calls, which might mean your second call could fail. – MarkH Jan 05 '22 at 13:50
  • @MarkH Ah, yes, thank you. And the potential race condition is something that ought to be considered, yes. Dare say things have moved on quite a bit since I first asked this, and perhaps the Windows API now has a very different canonical answer to my question? I cannot say, but thank you for both points (and the sympathy!). – Zorawar Jan 05 '22 at 15:14
  • 1
    As a side note, using a `static` buffer may bite you in the future as the function won't be thread-safe. We hit that issue with the standard library in Solaris, where they were avoiding a small memory allocation by using a `static char buffer[SIZE]`... which was great until you hit two threads calling the same function. Personally, I would make the function fail and report the needed size. – David Rodríguez - dribeas Apr 11 '23 at 09:27

5 Answers5

12
DWORD bufferSize = 65535; //Limit according to http://msdn.microsoft.com/en-us/library/ms683188.aspx
std::wstring buff;
buff.resize(bufferSize);
bufferSize = GetEnvironmentVariableW(L"Name", &buff[0], bufferSize);
if (!bufferSize)
    //error
buff.resize(bufferSize);

Of course, if you want ASCII, replace wstring with string and GetEnvironmentVariableW with GetEnvironmentVariableA.

EDIT: You could also create getenv yourself. This works because

The same memory location may be used in subsequent calls to getenv, overwriting the previous content.

const char * WinGetEnv(const char * name)
{
    const DWORD buffSize = 65535;
    static char buffer[buffSize];
    if (GetEnvironmentVariableA(name, buffer, buffSize))
    {
        return buffer;
    }
    else
    {
        return 0;
    }
}

Of course, it would probably be a good idea to use the wide character versions of all of this if you want to maintain unicode support.

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • resize. Yeah, that would be easier...(damn it). Edit: although, it's still messing about with memory allocation (dignity saved slightly?). Is there a way to do it by allocating memory only once, or is that a dumb question? – Zorawar Nov 09 '10 at 03:36
  • @Zorawar: In most STL implementations, if you `resize` to a lower value memory is not reallocated. Therefore there's only one allocation here. – Billy ONeal Nov 09 '10 at 13:43
  • I suppose you could do this with a stack allocated buffer but 131,070 bytes is an awfully large value to put on the stack. – Billy ONeal Nov 09 '10 at 13:50
  • Yeah, that's why I don't want to second-guess the environment variable size and use up all that memory for no reason! No replies about my idea of forcing `GetEnvironmentVariable` to fail deliberately so I can get a variable size. I guess it's either not a good idea, or redundant... – Zorawar Nov 09 '10 at 15:58
  • @Zorawar: The deliberate failure is, I believe, the "normal" C-like way of accessing this API. Doing this is really a tradeoff. You spend some more memory but you save calling the function twice. `GetEnvironmentVariable` has a reasonable amount of overhead all on it's own. You could probably design a faster version on top of GetEnvironmentStrings (Because it merely returns a pointer to the process environment block), but I assumed something like this was not performance critical code. – Billy ONeal Nov 09 '10 at 16:07
  • @Billy: Calling `GetEnvironmentVariable` twice did occur to me, and I did guess that it would probably be the worse option in practical terms. But the fact that `GetEnvironmentVariable` must be doing more work than what `getenv` would do is really driving the point home! – Zorawar Nov 09 '10 at 16:18
  • @Zorawar: It's probably *not* doing more work than `getenv` though. Just because an API is simpler doesn't mean it's doing less work. Though; environment variables are a bigger deal on *nix machines so I wouldn't be surprised if they have a lg(n) way of doing it. OTOH, it's difficult to provide a true posix-style `getenv` on Windows because Windows is a Unicode environment, while POSIX is not. Given the requirements of getenv though, it's likely a statically allocated buffer is used instead of the dynamic allocation proposed here. – Billy ONeal Nov 09 '10 at 16:38
  • @Zorawar: What I mean is, you could use a statically allocated buffer and implement `getenv` yourself, because calls to `getenv` destroy the return values of previous calls to `getenv`. However, if you implemented *exactly* `getenv` you'd be throwing Unicode support out the window,. – Billy ONeal Nov 09 '10 at 16:39
  • @Billy: A bit of messing about with the code has led me to the conclusion that doing something like this is probably the best bet (I'll update my question when I do it and check for sure). Any recommended `DWORD` value for the call to `GetEnvironmentVariable`? As you pointed out in the link provided above, I could use `32767*sizeof(...)`, but is there a practical limit less than this I should probably stick to? – Zorawar Nov 09 '10 at 17:25
  • @Zorawar: As MSalters points out, you're expecting a path here. Most of the Windows APIs assume you are using Win32 paths rather than NT paths, which limits the total usable path size to `MAX_PATH`, which (I believe) is 260 characters. – Billy ONeal Nov 09 '10 at 17:28
  • Using `static` here makes the app not thread safe. This approach is why VC++ deprecates many of the CRT functions (see also `strtok`). I would not follow the recommendations here personally, not just for that reason - the large default heap usage, and the need to implement again for Unicode also seem less than ideal. – Steve Townsend Sep 12 '14 at 17:46
  • @Steve: `getenv`'s interface is not thread safe. If you have multiple threads all messing with the environment there's probably a problem there anyway. I agree this is a terribly designed API but that's true of many APIs designed almost 30 years ago. – Billy ONeal Sep 12 '14 at 20:22
  • I'd say that's even more reason not to breathe new life into a broken API design pattern (your `WinGetEnv` code). Hence, the solutions that use stack-based data are preferable to me. I was not intending to say 'just use `getenv`'. – Steve Townsend Sep 12 '14 at 22:39
  • What happens when the value of an environment variable is a zero-length string? How do you distinguish that from an environment variable that is not defined at all, and from an error condition? – Lassi Aug 02 '20 at 16:21
  • returning 0 crashes the app use "" instead – joseph lake Jun 25 '23 at 20:19
7

This wasn't the original question, but it might worth to add the MFC way to this thread for reference:

CString strComSpec;
if (strComSpec.GetEnvironmentVariable(_T("COMSPEC")))
{
    //Do your stuff here
}
user1228651
  • 81
  • 1
  • 1
4

VC++ implements getenv in stdlib.h, see, for example, here.

David Norman
  • 19,396
  • 12
  • 64
  • 54
  • I'm not on my Windows machine right now, but it was spitting out something about deprecation when I tried to compile with getenv (unless I was being retarded and misread the error). I wasn't paying too much attention and I half-expected MS to be obtuse about this even though getenv is in the standard library. Why does MS have GetEnvironmentVariable in the first place? – Zorawar Nov 09 '10 at 03:33
  • I don't know. We use getenv here for our windows builds. I don't think we have any special compiler flags or anything. – David Norman Nov 09 '10 at 04:28
  • @Zorawar: `GetEnvironmentVariable` is part of the Win32 API, and can be used from other languages than C. `getenv` is part of C, and will obviously call `GetEnvironmentVariable`. Similarly, the Win32 API doesn't include `operator new` or `malloc`, but does contain `GlobalAlloc`. The "deprecated" message might happen when you call `putenv`, which is _not_ a standard C function. – MSalters Nov 09 '10 at 12:45
  • @MSalters: OK, that makes sense. For reference, the warning I get is `C4996: 'getenv': This function or variable may be unsafe. Consider using _dupenv_s instead...` – Zorawar Nov 09 '10 at 15:40
  • Theoretically, getenv is unsafe since it returns a pointer to an internal structure that could later be modified, e.g., through putenv. Since you're working on a 'nice little program' for personal use I'd imagine you're more interested in getting things going quickly rather than theoretical security concerns, so I think it's OK to use getenv. See http://msdn.microsoft.com/en-us/library/8ef0s5kh%28v=VS.80%29.aspx for more info on the warning. It looks like if you #define _CRT_SECURE_NO_WARNINGS the warning will go away. – David Norman Nov 09 '10 at 16:25
  • @David: Oh, I see. The VC++ compiler is just trying to be overly helpful rather than just sticking to syntactical issues. In my case, I don't think it's much of an issue since the way I'm using the environment variable won't cause any difficulties if something odd happens to it in between retrieving it and using it, so I might just stick with `getenv` in practise. – Zorawar Nov 09 '10 at 16:54
1

The suggestion you made at the end of your post is the right way to do this - call once to get required buffer size and then again to actually get the data. Many of the Win32 APIs work this way, it's confusing at first but common.

One thing you could do is to pass in a best-guess buffer and its size on the first call, and only call again if that fails.

Steve Townsend
  • 53,498
  • 9
  • 91
  • 140
  • Still, could I really justify two calls to `GetEnvironmentVariable` just to be a bit more pedantic about allocating memory? What's the textbook (C++) way of doing this? – Zorawar Nov 09 '10 at 17:53
  • This is Win32, which is a C API, not C++. Microsoft makes the rules here, and that's the 'right' way to do it. If you just want one call, you'll have to use a buffer big enough for anything you wish to handle. You still need to error check the API call anyhow. The max is 32767 including the terminating null, so no point in using a buffer bigger than that. – Steve Townsend Nov 09 '10 at 17:57
  • Keep in mind that there is a race condition between the two calls. In a simple app, it's a non-issue, but a strictly correct implementation is not specifically two calls, but a loop that calls it until it encounters success or an error. The _expected_ execution of that loop produces two calls, but if someone changes the variable in between the two calls, the loop will make a third call with a correspondingly-updated buffer. – Jonathan Gilbert Sep 12 '14 at 16:35
  • The buffer and its size are both stack-based. Only a very unusual app will allow either value to be altered between the two calls, even if multiple threads exist. – Steve Townsend Sep 12 '14 at 17:48
  • Do you think it is on purpose that they write in the documentation that the *content of buffer is **undefined*** instead of the content is *not touched*? Quote: [If lpBuffer is not large enough to hold the data, the return value is the buffer size, in characters, required to hold the string and its terminating null character and the contents of lpBuffer are undefined.](https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-getenvironmentvariable) – Wolf Oct 05 '20 at 13:49
0

Don't bother. %HOME% is a path on Windows, and should be usable by all reasonable programs. Therefore, it will fit in a WCHAR[MAX_PATH]. You don't need to deal with the edge case where it's longer than that - if it's longer, most file functions will reject it anyway so you might as well fail early.

However, do not assume you can use a TCHAR[MAX_PATH] or a char[MAX_PATH]. You do not have control over the contents of %HOME%; it will contain the users name. If that's "André" (i.e. not ASCII) you must store %HOME% in a WCHAR[MAX_PATH].

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Actually, on NT it need not fit in MAX_PATH, though lots of applications will probably break if it does not. ;) (NT's limit is 32k characters) And of course, someone can override the value with whatever they want, even if the override is not a valid path. – Billy ONeal Nov 09 '10 at 13:45
  • Ooh, don't tell me "Don't bother": I'm _this_ close to not bothering with the WinAPI at all! I haven't properly used VC++ in many years, but deosn't `TCHAR` resolved to `char` or `wchar_t`? (There's a `WCHAR` too!?) – Zorawar Nov 09 '10 at 15:54
  • @Zorawar: WCHAR is not a C or C++ datatype. It is a win32 datatype. Win32 is not a C api, it is a language agnostic API. Therefore they cannot depend on particular c types like char and wchar_t. On most platforms, CHAR expands to `char` and WCHAR expands to `wchar_t`. If you are compiling with Unicode support TCHAR will expand to WCHAR, and if you are compiling without Unicode support TCHAR will expand to CHAR. – Billy ONeal Nov 09 '10 at 16:48
  • @Billy ONeal: The 32KB limit only applies to paths with a `\\?\` prefix, which you're unlikely to see in a `%HOME%` variable - it breaks other common assumptions. – MSalters Nov 10 '10 at 08:04
  • Just to be clear, most of the underlying API has no issue with paths longer than `MAX_PATH`. `MAX_PATH` is the upper limit on the name of a path _component_, but the path can combine any number of components up to the size of a `UNICODE_STRING`. There is much confusion on the matter, though, because even within Microsoft, developers don't always understand this, and there are parts of the system API that can't handle paths longer than `MAX_PATH`. They are definitely possible, though, and properly-written applications can and do use them. – Jonathan Gilbert Sep 12 '14 at 16:32
  • It's also worth considering that %HOME% might be longer than 255 chars even if the home directory isn't allowed to by some of the components that handle it. %HOME% is not strictly required by the system to be a directory name. It's just an environment variable name, which also has a 64KB upper limit as I understand it. If your code assumes it will fit into MAX_PATH, you could encounter corner cases or undefined behaviour if an attacker intentionally makes it longer. – Jonathan Gilbert Sep 12 '14 at 16:33
  • @JonathanGilbert: That's a malicious environment and you can just fail safely there. Don't make the security problem worse, but `TerminateProcess` is probably acceptable. – MSalters Sep 12 '14 at 16:58
  • @MSalters, okay, but as per my previous comment it is still not "correct" to assume `%HOME%` will fit into `WCHAR[MAX_PATH]`. The best user experience, in my opinion, would come from code that dynamically allocates buffers for `%PATH%` and any derived paths, and accurately reports back whatever errors arise from APIs using those paths. Fixed-size buffers are a poor programming practice for a number of reasons, including security and run-time stack space usage. Also, any limit on the length of a path is "owned" by the system and thus should be enforced by the system, not your own application. – Jonathan Gilbert Oct 23 '14 at 19:25