1

I have a class that has an integral template parameter N.

template<unsigned int N>
class test {
}

Now I want to have a std::vector with an integral type that is as small as possible to hold N bits.

E.g.

class test<8> {
    std::vector<uint8_t> data;
}

class test<9> {
    std::vector<uint16_t> data;
}

class test<10> {
    std::vector<uint16_t> data;
}
...

Is there any better way to do this for N=1 to N=64?

max66
  • 65,235
  • 10
  • 71
  • 111
eclipse
  • 2,831
  • 3
  • 26
  • 34
  • What is the use case for such `vector`s? Does [`std::bitset`](http://en.cppreference.com/w/cpp/utility/bitset) solve your issue? – Algirdas Preidžius Sep 13 '17 at 20:46
  • I want to sort the resulting vector so I think bitset will not be feasible. – eclipse Sep 13 '17 at 20:48
  • To give some more context: it is a class that represents billions of strings with fixed length from an alphabet of 4 characters (genome). N is the number of characters and the maximum value is 32. 32 characters out of an alphabet of 4 can be represented with 64 bits. So the vector is basically lots of small genome samples and the size of those samples should be generic. – eclipse Sep 13 '17 at 20:52
  • @eclipse based on your comment, it looks like `test<16>` should use `uint32_t` (16 bases each of which takes 2 bits to store, for a total of 32 bits), but based on your question it looks like you want `uint16_t`. Which do you want? – Daniel H Sep 13 '17 at 21:31
  • 1
    Is there a reason it needs to be an integer type, or would any other type of the right size work if it had comparison operators to allow sorting? Also, since you want something sorted, do you need a `std::vector` instead of `std::set`? – Daniel H Sep 13 '17 at 21:37
  • @DanielH I simplified the question but yes `test<16>` should have 32 bits. Since performance is an issue I figured that a preallocated `std::vector` with one `std::sort` is the fastest. Other data types would be fine, as long as they are directly in the `std::vector` i.e. not pointers to objects. I am actually not aware of any other primitive type that works without pointers. Is there such a thing? Or can you add an object to a vector without a using a pointer? – eclipse Sep 14 '17 at 06:27

2 Answers2

3

What about using conditional?

#include <vector>
#include <cstdint>
#include <iostream>
#include <type_traits>

template <std::size_t N>
struct foo
 {
   static_assert( N < 65U, "foo 64 limit");

   using vType = typename std::conditional<
      (N < 9U), std::uint8_t,
        typename std::conditional< (N < 17U), std::uint16_t,
           typename std::conditional< (N < 33U), std::uint32_t, std::uint64_t
   >::type>::type>::type;

   std::vector<vType> data;
 };

int main()
 {
   static_assert( 1U == sizeof(foo<1>::vType), "!");
   static_assert( 1U == sizeof(foo<8>::vType), "!");
   static_assert( 2U == sizeof(foo<9>::vType), "!");
   static_assert( 2U == sizeof(foo<16>::vType), "!");
   static_assert( 4U == sizeof(foo<17>::vType), "!");
   static_assert( 4U == sizeof(foo<32>::vType), "!");
   static_assert( 8U == sizeof(foo<33>::vType), "!");
   static_assert( 8U == sizeof(foo<64>::vType), "!");

   // foo<65> f65; compilation error
 }

Or, in a more elegant way (IMHO), you can define a type traits (selectTypeByDim, in the following example) that select the first useful type in a list

#include <tuple>
#include <vector>
#include <cstdint>
#include <climits>
#include <type_traits>

template <std::size_t N, typename T,
   bool = (N <= sizeof(typename std::tuple_element<0U, T>::type)*CHAR_BIT)>
struct stbdH;

template <std::size_t N, typename T0, typename ... Ts>
struct stbdH<N, std::tuple<T0, Ts...>, true>
 { using type = T0; };

template <std::size_t N, typename T0, typename ... Ts>
struct stbdH<N, std::tuple<T0, Ts...>, false>
 { using type = typename stbdH<N, std::tuple<Ts...>>::type; };

template <std::size_t N, typename ... Ts>
struct selectTypeByDim : stbdH<N, std::tuple<Ts...>>
 { };

template <std::size_t N>
struct foo
 {
   static_assert( N < 65U, "foo 64 limit");

   using vType = typename selectTypeByDim<N,
         std::uint8_t, std::uint16_t, std::uint32_t, std::uint64_t>::type;

   std::vector<vType> data;
 };

int main()
 {
   static_assert( 1U == sizeof(foo<1U>::vType), "!");
   static_assert( 1U == sizeof(foo<CHAR_BIT>::vType), "!");
   static_assert( 2U == sizeof(foo<CHAR_BIT+1U>::vType), "!");
   static_assert( 2U == sizeof(foo<(CHAR_BIT<<1)>::vType), "!");
   static_assert( 4U == sizeof(foo<(CHAR_BIT<<1)+1U>::vType), "!");
   static_assert( 4U == sizeof(foo<(CHAR_BIT<<2)>::vType), "!");
   static_assert( 8U == sizeof(foo<(CHAR_BIT<<2)+1U>::vType), "!");
   static_assert( 8U == sizeof(foo<(CHAR_BIT<<3)>::vType), "!");

   //foo<(CHAR_BIT<<3)+1U> f65; compilation error
 }
max66
  • 65,235
  • 10
  • 71
  • 111
1

Based on the comments to your question, you don't actually need a built-in type. Any copy-able move-able object can be put into a vector.

If all you want to do with the data is sort it, I would recommend either std::bitset<2*N> or std::array<std::byte, (2*N+7)/8> (or std::array<std::byte, (2*N+CHAR_BIT-1)/CHAR_BIT> if you run on a weird system) with custom Compare for sorting. If you have any other uses for it (which you probably do), I would recommend a custom class with one of those as its underlying storage; this lets you add more features like getting a specific base.

On most systems, a std::bitset will have no overhead if N is a multiple of either 32 or 64 (implementation dependent), but that isn't guaranteed. The byte array won't have any more overhead than is required by C++ in general. Both of these can be expanded beyond 64 bits if necessary, and the byte array will have the least space overhead, less even than using integral types because it doesn't force the size to be a power of two. Adding a custom class on top of either of these doesn't give any overhead at all.

(If you are using C++14 or below you can use char or uint8_t instead of byte; a byte is closer to what you want but is only available in C++17. If you're using C++14 now but might switch, I'd recommend putting using byte = uint8_t in your code somewhere and then switching it out when you upgrade)

Daniel H
  • 7,223
  • 2
  • 26
  • 41
  • I actually used the solution with the byte array first. But I think I got one thing wrong which made me consider other options. When I have a class with only one field `std::array` how are objects of that class layed out inside the vector? I read today about 'placement new'. But I am not a hundred percent sure how it would work in this case. You say that it does not add overhead, how so? – eclipse Sep 14 '17 at 20:14
  • 1
    A vector has a bunch of objects laid out consecutively, one after the other, just like an array but created at runtime. A class is laid out exactly like each of its members one after the other, perhaps with some padding between members or at the end for alignment purposes. Since it would only have one data member, there would be no additional alignment requirements over just using that, and besides that data member would be `byte`s which don’t have alignment requirements anyway, so there would be no padding. – Daniel H Sep 14 '17 at 20:44
  • Hm okay I see how it would work like this with `std::byte[]` but `std::array` has more than one member that would be in the vector?! – eclipse Sep 14 '17 at 20:46
  • 1
    No, `std::array` is usually implemented to have a single member, of type `T[N]`, except I there’s a complication if `N` is zero. Even if they did something else, it probably wouldn’t be something which added space or time overhead. – Daniel H Sep 14 '17 at 20:48
  • 1
    Are you perhaps used to some other languages? In C++, classes are pretty much a zero-cost abstraction, but in a lot of other languages there is more indirection involved and wrapping things in a class can make them bigger or slower. – Daniel H Sep 14 '17 at 20:52
  • Java :D shame on me. I found this: "This container is an aggregate type with the same semantics as a struct holding a C-style array T[N] as its only non-static data member. " here: http://en.cppreference.com/w/cpp/container/array. Which confirms your answers. – eclipse Sep 14 '17 at 20:54
  • I think you are mistaken however since `std::array` can use extra padding. See here: https://stackoverflow.com/questions/19103244/is-the-size-of-stdarray-defined-by-standard?noredirect=1&lq=1 – eclipse Sep 14 '17 at 21:18
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/154486/discussion-between-daniel-h-and-eclipse). – Daniel H Sep 14 '17 at 21:21