0

I have this assignment for homework, which consists of me implementing a buffer optimization to a string class. I have to save the pointer from bytes 8 to 16, because the first byte is used to see if the string is heap-allocated or stack-allocated, and bytes 1-7 are used to save the length of the string. I need help finding out how I can save a pointer to dynamically allocated memory and then return it from bytes 8-16.

How I create the buffer:

alignas(CharT) std::array<std::byte, SZ> buffer;

How I want to return the buffer:

return reinterpret_cast<CharT *>(buffer[8]);

I tried to use memcpy() and memmove(), but when I do that I am able to return only 8 bytes from the original string.

How I create and manipulate the pointer:

auto ptr = ::operator new[](count_bytes, std::align_val_t{std::alignment_of_v<CharT>});
std::memcpy(ptr, sv.data(), count_bytes);
std::memmove(&buffer[8], ptr, sizeof(ptr));

The implementation of the buffer:

template<typename CharT = char, std::size_t SZ = 32>
struct storage {
private: 
    alignas(CharT) std::array<std::byte, SZ> buffer;

public:

    void set(const std::basic_string_view<CharT> &sv) { 
        std::size_t count_bytes = sizeof(CharT) * (sv.length() + 1); ///counts the bytes of the string
        auto ptr = ::operator new[](count_bytes, std::align_val_t{std::alignment_of_v<CharT>}); /// make a pointer to the memory location
        std::memcpy(ptr, sv.data(), count_bytes);
        std::memmove(&buffer[8], &ptr, sizeof(ptr));
        operator delete[](ptr);
    }

    const CharT *str() {
        /// Returs a pointer to the first byte of the string
        return reinterpret_cast<CharT *>(buffer[8]);
    }
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
stefank02
  • 3
  • 3
  • A typical solution for your version of SSO is to employ a `union` to alias the buffer with "long" data (pointer, capacity, size). This is how [libc++ works](https://github.com/llvm/llvm-project/blob/main/libcxx/include/string#L784). Note that it requires a C++ implementations which supports reading inactive union members. Relevant question: [What are the mechanics of short string optimization in libc++?](https://stackoverflow.com/q/21694302/580083) – Daniel Langr Jan 24 '23 at 12:03
  • None of this makes much sense in terms of how to design classes, how to handle manual memory allocation or how to write efficient code. If you have a private buffer then why set it fixed to a certain size... the memory is already allocated right there, by whoever created the template class. And you can't know if that memory is on the stack or the heap. – Lundin Jan 24 '23 at 12:55
  • The assignment said that I need to save the string in the buffer if it is small enough and set the first byte to 1 in order to know that it is on the stack. If not I set it to 0 and allocate dynamically. I did not provide this info because i did not think it is necessary for the answer of the question. Thanks a lot for the information! I read something about unions but I am have to use the array. – stefank02 Jan 24 '23 at 13:09
  • This whole task becomes a lot simpler with the use of placement new or `std::variant` – Mooing Duck Jan 24 '23 at 19:27
  • Does `storage` have to be implemented that way? Because IMO the better implementation would be for `storage` to have a `CharT* data` member, and then a buffer that contains either `CharT[SZ] bytes` (if data points at the buffer) or `std::size_t size` (if data points at the heap). That way most `const` methods (except `size()`) can simply use `data` without checking where the data is. `size()` requires a linear scan if `bytes` is in use, but it's still O(1) – Mooing Duck Jan 24 '23 at 19:38
  • I am not allowed to save the length or anything else really outside of the buffer. By outside i mean as a variable. Every characteristic of the string has to be inside like the allocation (1 for stack, 0 for heap) on byte 0, on byte 1 we have to have the length followed by the actual string or save the length on bytes 1-7 and save an address to the dynamically allocated memory from bytes 8-16 and has to be of type std::byte so I have a ptr to the data but i need a function to extract it from the storage and return it to the other classes. – stefank02 Jan 25 '23 at 08:42

2 Answers2

1

I see several issues with your code:

  • There is no need to use ::operator new[] directly, using new CharT[] normally will suffice.

  • You are copying 1 too many CharTs from sv into the memory you are allocating.

  • buffer[8] is a single byte, whose value you are reinterpreting as a pointer, which is wrong. You need to instead reinterpret the address of that byte. If you are intending to return a CharT* pointer to the actual bytes following your length and flag bytes (ie, for a stack-allocated string), then reinterpreting the byte address as a CharT* is fine. But, if you are intending to return the stored CharT* pointer (ie, to a heap-allocated string), then you need to instead reinterpret the byte address as CharT** and dereference that pointer.

  • you are not storing the allocated memory length or your allocation flag into your buffer, as you have described it must contain.

  • After copying the value of ptr into your buffer, you are calling operator delete[] to free the memory you just allocated, thus leaving buffer with a dangling pointer.

  • if the buffer holds a pointer to dynamically allocate memory, and the set() is called again, the previous allocated memory is leaked.

With that said, try something more like this instead:

template<typename CharT = char, std::size_t SZ = 32>
struct storage {
private: 
    static_assert(SZ >= (8 + sizeof(CharT*))), "SZ is too small");

    struct header {
        std::uint64_t flag: 8;
        std::uint64_t length: 56;
        union {
            CharT* ptr;
            std::byte data[SZ-8];
        } u;
    };

    alignas(CharT) std::array<std::byte, SZ> buffer;

public:

    void clear() {
        header *hdr = reinterpret_cast<header *>(&buffer[0]);
        if (hdr->flag == 1)
            delete[] hdr->u.ptr;
        buffer.fill(0);
    }

    void set(const std::basic_string_view<CharT> &sv) { 
        std::uint64_t len = sv.length();
        if (len > 0x00FFFFFFFFFFFFFFULL) { /// make sure the length fits in 7 bytes
            throw std::length_error("sv is too long in length");
        }

        clear();

        header hdr;
        hdr.flag = 1;
        hdr.length = len;
        hdr.u.ptr = new CharT[len+1]; /// make a pointer to the memory location
        std::copy_n(sv.data(), len, hdr.u.ptr);
        hdr.u.ptr[len] = static_cast<CharT>(0);

        std::memcpy(&buffer, &hdr, sizeof(hdr));
    }

    const CharT* str() const {
        /// Returns a pointer to the first byte of the string
        const header *hdr = reinterpret_cast<const header *>(&buffer[0]);
        if (hdr->flag == 1)
            return hdr->u.ptr;
        else
            return reinterpret_cast<const CharT*>(hdr->u.data);
    }

    uint64_t length() const {
        /// Returns the string length
        return reinterpret_cast<const header *>(&buffer[0])->length;
    }

    ...
};

That being said, I would take this a step further by making the flag field be 1 bit instead of 1 whole byte, thus giving the length field 7 more bits to work with, eg:

template<typename CharT = char, std::size_t SZ = 32>
struct storage {
private: 
    ...

    struct header {
        std::uint64_t flag: 1;
        std::uint64_t length: 63;
        ...
    };

    ...

public:

    void set(const std::basic_string_view<CharT> &sv) { 
        std::uint64_t len = sv.length();
        if (len > std::numeric_limits<int64_t>::max()) { /// make sure the length fits in 63 bits
            throw std::length_error("sv is too long in length");
        }

        ...
    }

    ...
};
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
-1

the bytes 1-7 are used to save the length

You meant bytes 0-7.

std::memmove(&buffer[8], ptr, sizeof(ptr));

The 2nd parameter to memmove is the starting address of the bytes to copy. You are telling memmove to copy sizeof(ptr) bytes from the memory address that ptr is pointing to.

You describe your intended goal as to copy the pointer itself, its sizeof(ptr) bytes, rather than sizeof(ptr) from wherever the pointer is pointing to. If so, then this should be:

std::memmove(&buffer[8], &ptr, sizeof(ptr));
Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • I tried to do that but when I looked at the memory view of &buffer[8] I get `40 00 00 00 10 61 00 00` which I don't know if is a valid ptr address and if so how can I return the whole thing. Because when I try to print the result I get the ASCII representation of the bytes( in this case: @----a--) – stefank02 Jan 24 '23 at 11:49
  • It's unclear to me, what you're looking at, what is the code that produced this result, and it is unlikely that anyone will be able to help you with code that you did not show. Based on the description in the given question, and the shown function call, that was the obvious problem. Of course, there could be some other issues with the code that you did not show, or perhaps your problem description was inaccurate. Unfortunately, nobody on Stackoverflow can figure out the problem with code that's not shown. Are you familiar with Stackoverflow's requirements for a [mre]? – Sam Varshavchik Jan 24 '23 at 11:55
  • 1
    The first byte (byte 0) is a flag, bytes 1-7 are used to save the length, according to the question. So OP really meant 1-7. – Daniel Langr Jan 24 '23 at 11:59
  • Thanks very much for the help! I edited the question to include how i actually do things in the hopes that I will make it more clear! – stefank02 Jan 24 '23 at 12:06
  • `operator delete[](ptr);` ends up deleting the buffer that was just allocated. Any subsequent attempts to examine what `ptr` was pointing to, and what `str()` returns, results in undefined behavior. Can you explain, in your own words, why you believe this `delete[]` is necessary? – Sam Varshavchik Jan 24 '23 at 12:32
  • Because otherwise I get leakage of memory and as far as I saw in the memory view the dynamically allocated block was unaffected – stefank02 Jan 24 '23 at 13:11
  • Then you need `delete` the memory ***after*** it is used. This is what destructors are for. The class also needs a copy constructor and an assignment operator. See your C++ textbook for more information. – Sam Varshavchik Jan 24 '23 at 13:25