2

I know legacy is always a justification, but I wanted to check out this example from MariaDB and see if I understand it enough to critique what's going on,

static int show_open_tables(THD *, SHOW_VAR *var, char *buff) {
  var->type = SHOW_LONG;
  var->value = buff;
  *((long *)buff) = (long)table_cache_manager.cached_tables();
  return 0;
}

Here they're taking in char* and they're writing it to var->value which is also a char*. Then they force a pointer to a long in the buff and set the type to a SHOW_LONG to indicate it as such.

I'm wondering why they would use a char* for this though and not a uintptr_t -- especially being when they're forcing pointers to longs and other types in it.

Wasn't the norm pre-uintptr_t to use void* for polymorphism in C++?

Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
  • You need to study how `show_open_tables()` is actually used in the broader codebase. I see it being stored in a lookup table with other functions that utilize other data types than `long`. So clearly the use of a `char*` parameter is part of a larger API that requires the functions to have a uniform signature. In this case, it might have made more sense to use `void*` instead of `char*` – Remy Lebeau Jul 07 '18 at 01:08
  • Is there any reason whatsoever to store the pointer as `char*` instead of `void*` and would this be an ideal case for `uintptr_t` with C++11 being that *"MySQL 8.0 source code permits use of C++11 features."*? – Evan Carroll Jul 07 '18 at 01:18
  • Files contain bytes at the lowest level. The closest C++ type is `char`. The same goes for memory. – Sid S Jul 07 '18 at 01:31
  • 1
    The cast to `long*` is presumably because the buffer has a known memory layout at the start, where it contains a `long` value. – Cheers and hth. - Alf Jul 07 '18 at 01:57
  • Good job that's a static. I'd hate anyone to see _my_ dirty linen in public. – Paul Sanders Jul 11 '18 at 05:12
  • @Remy _You need to study how show_open_tables() is actually used in the broader codebase._ In a word: carefully. – Paul Sanders Jul 11 '18 at 05:50

2 Answers2

2

There seems to be two questions here. So I've split my answer up.

Using char*

Using a char* is fine. Character types (char, signed char, and unsigned char) are specially treated by the C and C++ standards. The C standard defines the following rules for accessing an object:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type.

This effectively means character types are the closest the standards come to defining a 'byte' type (std::byte in C++17 is just defined as enum class byte : unsigned char {})

However, as per the above rules casting a char* to a long* and then assigning to it is incorrect (although generally works in practice). memcpy should be used instead. For example:

long cached_tables = table_cache_manager.cached_tables();
memcpy(buf, &cached_tables, sizeof(cached_tables));

void* would also be a legitimate choice. Whether it is better is a mater of opinion. I would say the clearest option would be to add a type alias for char to convey the intent to use it as a byte type (e.g. typedef char byte_t). Of the top of my head though I can think of several examples of prominent libraries which use char as is, as a byte type. For example, the Boost memory mapped file code gives a char* and leveldb uses std::string as a byte buffer type (presumably to taking advantage of SSO).

Regarding uinptr_t:

uintptr_t is an optional type defined as an unsigned integer capable of holding a pointer. If you want to store the address of a pointed-to object in an integer, then it is a suitable type to use. It is not a suitable type to use here.

Jarra McIntyre
  • 1,265
  • 8
  • 13
  • This is all fine, but I seriously doubt if the author of that code would knew or cared about pointer aliasing issues. It simply wasn't relevant back then. – Paul Sanders Jul 11 '18 at 05:48
  • @PaulSanders: The author probably assumed that quality compilers suitable for systems programming would be able to handle that construct--an assumption which IMHO is just as true today as ever. The only thing that's changed is that compiler writers now treat quality-of-implementation issues as invitation to produce crummy-but-"conforming" compilers. – supercat Jul 11 '18 at 18:41
  • @supercat What systems programming? – Paul Sanders Jul 11 '18 at 22:52
  • @PaulSanders: Looking at the code snippet, I think it is intended for use on compilers that are suitable for systems programming (which includes, among other things, the ability to manage the re-use of storage for different purposes), as opposed to e.g. those that are intended exclusively for running high-end number crunching applications that don't require any "interesting" pointer semantics and never recycle heap storage except through `malloc` and friends [as opposed to allocating a big block and keeping it, while using the bytes therein for many purposes]. – supercat Jul 11 '18 at 23:33
  • @supercat It's part of MySQL, God help us. – Paul Sanders Jul 12 '18 at 02:55
  • @PaulSanders: The code is not portable, but should be usable on any quality general-purpose implementation for little-endian hardware which allows loads/stores of arbitrary alignment, if it makes a bona fide effort to be compatible with the dialect that was widely supported in the 1990s [something any quality general-purpose implementation for such platforms should be configurable to do, since it would fairly easily make the implementation compatible with a wide range of programs]. – supercat Jul 12 '18 at 04:46
  • @PaulSanders: Provided that `long` is configurable to be 32 bits, and aliasing logic isn't downright obtuse, the code should be fine. I'd worry far less about code which makes obvious portability assumptions than with what gcc may do with code which gets processed with full optimizations enabled and expects gcc to correctly process cases which the Standard actually defines, but which are hard for a compiler to distinguish from those it doesn't. – supercat Jul 12 '18 at 04:50
  • @PaulSanders As I said the method used will generally work. However, the reality is that using `memcpy` is just as easy and avoids undefined behaviour. I would expect a code base such as MySQL which targets a wide variety of operating systems, cpu architectures and compilers; and (presumably) aims for maximum stability to not violate aliasing rules. Of course there are plenty of other problems with this code -- e.g. the unchecked write to a byte buffer and I hope this is not intended to be ever serialised to disk but those issues seem orthogonal to the original question. – Jarra McIntyre Jul 12 '18 at 08:39
0

they're taking in char* and they're writing it to var->value which is also a char*. Then they force a pointer to a long in the buff and set the type to a SHOW_LONG to indicate it as such.

Or something. That code is hideous.

I'm wondering why they would use a char* for this though and not a uintptr_t -- especially being when they're forcing pointers to longs and other types in it.

Who knows? Who knows what the guy was on when he wrote it? Who cares? That code is hideous, we certainly shouldn't be trying to learn from it.

Wasn't the norm pre-uintptr_t to use void* for polymorphism in C++?

Yes, and it still is. The purpose of uintptr_t is to define an integer type that is big enough to hold a pointer.

I wanted to check out this example from MariaDB and see if I understand it enough to critique what's going on

You might have reservations about doing so but I certainly don't, that API is just a blatant lie. The way to do it (if you absolutely have to) would (obviously) be:

static int show_open_tables(THD *, SHOW_VAR *var, long *buff) {

  var->type = SHOW_LONG;
  var->value =  (char *) buff;
  *buff = (long)table_cache_manager.cached_tables();
  return 0;

}

Then at least it is no longer a ticking time bomb.


Hmmm, OK, maybe (just maybe) that function is used in a dispatch table somewhere and therefore needs (unless you cast it) to have a specific signature. If so, I'm certainly not going to dig through 10,000 lines of code to find out (and anyway, I can't, it's so long it crashes my tablet).

But if anything, that would just make it worse. Now that timebomb has become a stealth bomber. And anyway, I don't believe it's that for a moment. It's just a piece of dangerous nonsense.

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
  • Whether `char*` or `long*` is better depends upon the nature of the container. If code is intended exclusively for quality implementations that are suitable for low-level programming and run onplatforms that allow unaligned stores, and if the pointers in question are formed by allocating arbitrary number of bytes from a pool, `char*` wouldn't be unreasonable though I'd favor `uint8_t`. On quality implementations that allow it, I'd favor casting to `__packed long*`, but that's not a standard feature. – supercat Jul 11 '18 at 18:44
  • @supercat I have no idea what you're talking about, sorry. – Paul Sanders Jul 11 '18 at 22:48
  • @supercat But my comment above might shed some light on things. – Paul Sanders Jul 12 '18 at 02:56
  • The purpose of the code is most likely to write four consecutive bytes at the address given by `buff`. Essentially equivalet to `temp = ...; buff[0] = temp; buff[1] = temp >> 8; buff[2] = temp >> 16 buff[3]= temp >> 24;` but much faster on most implementations that support it. – supercat Jul 12 '18 at 04:58
  • @supercat Well of course. You can see the code doing exactly that. _So why tell the caller to pass in a char *`_? Like I say, timebomb. The author must just have had a bad day. – Paul Sanders Jul 12 '18 at 05:51
  • If different parts of the buffer will be written using different types, having the caller use a `char*` to the buffer and add byte offsets to pointers before passing them to various routines seems cleaner than requiring that the caller convert pointers before passing them to different functions, especially since the caller should have no reason to care about whether the called function is doing four 8-bit writes or one 32-bit write unless the platform would impose stiffer alignment requirements on the latter. – supercat Jul 12 '18 at 06:43