3

I was reading some code and I came across this example. What I don't understand is why the author uses an offset of 1 from both variables on the last line. At first glance I would assume this is illegal because it is referring to a possibly uninitialized memory area (and it could cause a segmentation fault). My head keeps telling me undefined behavior but is this really so?

static bool lt(wchar_t a, wchar_t b)
{
    const std::collate<wchar_t>& coll =
        std::use_facet< std::collate<wchar_t> >(std::locale());
    return coll.compare(&a, &a+1, &b, &b+1) < 0;
}

The last line is the one in question. Why is it necessary that he's doing this, is it legal, and when should it be done?

David G
  • 94,763
  • 41
  • 167
  • 253
  • collate::compare takes a range of characters. Here, the author is using a single object as an iterator range. [This is perfectly legal](http://stackoverflow.com/q/9114657/485561). – Mankarse Aug 01 '13 at 02:40
  • @Mankarse Address of a function argument + 1 - isn't that invalid pointer? If a and be were pointers and it was (a, a + 1, b, b + 1), that would be valid; but this looks like undefined behavior. I will wait for someone who has read the specs. – Amarghosh Aug 01 '13 at 02:43
  • 1
    @Amarghosh: I *have* read the specs. See the linked QA. `[expr.add]/4: For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.` – Mankarse Aug 01 '13 at 02:54
  • @Mankarse I wasn't suggesting otherwise. Thanks for the reference, learned something today :) – Amarghosh Aug 01 '13 at 03:25

3 Answers3

1

It appears that the author just wanted to compare two characters using the current global locale.

Since std::collate<T>::compare uses [low, high) for the two ranges, adding 1 to the address of the parameters will simply cause the comparison to stop after only a is compared to b. There should be no invalid memory accesses.

Peter Huene
  • 5,758
  • 2
  • 34
  • 35
  • 2
    Even with no memory accesses I'm not sure this is legal. I know you can use an address of one past the last element for arrays, but I'm not sure you can for non-array variables. – David Brown Aug 01 '13 at 02:42
  • 2
    @DavidBrown: It is legal. The standard explicitly allows programs to use a variable as an array of exactly one element, and thus a pointer one beyond the variable is guaranteed to create a valid half-closed range – David Rodríguez - dribeas Aug 01 '13 at 02:43
0

what book are you reading, plus it depends on what are you comparing too!

sometimes you need to compare an ID that happens to be in the beginning of buffer, and with a certain size.

aah134
  • 860
  • 12
  • 25
0

Testing your function

#include <locale>

static bool lt(wchar_t a, wchar_t b)
{
    const std::collate<wchar_t>& coll =
        std::use_facet< std::collate<wchar_t> >(std::locale());
    return coll.compare(&a, &a+1, &b, &b+1) < 0;
}


int main () {

    bool b = lt('a', 'b');
    return 0;
}

Inside the debugger

Breakpoint 1, main () at test.cpp:13
13      bool b = lt('a', 'b');
(gdb) s
lt (a=97 L'a', b=98 L'b') at test.cpp:6
6           std::use_facet< std::collate<wchar_t> >(std::locale());
(gdb) p &a
$1 = 0x7fffffffdddc L"a\001翿\x400885"
(gdb) p &a+1
$2 = 0x7fffffffdde0 L"\001翿\x400885"

From this I believe

  1. the code is legal
  2. but &a + 1 is referring to possibly uninitialized memory

From what gdb returns I tend to think taking the address of a wchar_t returns a char* thus &a (a is a wchar_t) is a char* to the start of the multibyte variable that is a and &a+1 returns pointer to the second byte. Am I correct?

ubi
  • 4,041
  • 3
  • 33
  • 50
  • 1
    No, you are not correct. `&a+1` gives a "one-past-the-end" address (treating `a` as a one-element array). It is not legal to dereference `&a+1`. This is used to treat `a` as a one element array, so that it can be used in `compare`, which expects to be given the start and end address of an array of characters. – Mankarse Aug 01 '13 at 03:13
  • 1
    You are correct in that `&a+1` may be referring to initialized memory. But that's ok, since the call to `coll::compare` does not read from or write to that memory. Instead, it uses the pointer to determine that its has reached the end of the sequence. – Marshall Clow Aug 01 '13 at 03:18
  • And you are incorrect in believing that `&a` is a `char *` - it is a `char_t *`. And `&a+1` is not the second byte of the multibyte char, but the address of the "next" multibyte char. – Marshall Clow Aug 01 '13 at 03:20
  • @Mankarse I came to the assumption from the gdb output. E.g., `p &a` outputs the whole multibyte character and `p &a+1` outputs all bytes except the first one, which seems equivalent to iterating a char array (multibyte character) with a `char*` – ubi Aug 01 '13 at 03:40