How to implement own function to std::vector?

Question

I would like to add a function that returns the .size() value as an integer, instead of unsigned integer.

Edit: Due to comments, i explain more detailed:

I have code:

int something = 3;
if(arr.size() > something)

Which will produce compiler warning, and i dislike adding (int) to every place where i have this. So, a solution i thought it would be nice to have sizei() function:

int something = 3;
if(arr.sizei() > something)

Which wouldnt produce a warning.

So, im not wanting to create a separate function, but a function in the std::vector itself.

Edit: Seems like the only way to do this is to create another function, such as:

template <typename T>
inline int sizei(const T &arr){
    return (int)arr.size();
}

On the positive side: this doesnt seem to increase my executable size at all.

I'm not sure this question makes any sense. Do you have a specific problem with the type being returned? — Adrian Conlon, Jun 15 '11 at 17:13
The question was unreasonably downvoted. Some compilers emit warnings on mixed signed-unsigned comparisons, and the only good use case I've found for unsigned numbers is well-defined modular arithmetic. Using unsigned to restrict the domain is (almost) always the wrong thing to do, IMO. — zvrba, Jun 15 '11 at 17:17
Of course you shouldn't compare mixed signs, but if this comes up you should convert the result of `size()` at the place where you need it, i.e. the conversion. That way you localize the problem and don't need to change the semantics of the vector, and the inherently unsigned nature of a "size". — Kerrek SB, Jun 15 '11 at 17:20
@littleadv, because i get warnings when i compare ints vs unsigned ints. @Kerrek, im not trying to replace size(), but create another function called sizei() to return int. — Rookie, Jun 15 '11 at 17:21
What's wrong with `size_t something = 3; if(arr.size() > something)`? That's the correct way to do this. — Judge Maygarden, Jun 15 '11 at 17:29
@Judge, because i would then have to convert all my integer types into unsigned ints....... and its much easier if i just had a function called sizei() at the places where i compare to ints. — Rookie, Jun 15 '11 at 17:30
@Rookie - then compare to unsigned `int`s. If for whatever reason you can't - cast. That's what it is for. — littleadv, Jun 15 '11 at 17:32
@Rookie It may seem easier now, but it won't be easier to maintain in the long run. You'll possibly sacrifice correctness as well. — Judge Maygarden, Jun 15 '11 at 17:33
@Rookie: not all your integer types, just the ones that represent sizes. Btw, even if you want a signed type, `int` is the wrong choice for something representing the possible size of a vector, since in some implementations vectors can have more than `INT_MAX` elements. — Steve Jessop, Jun 15 '11 at 17:33
@Steve, i know there can be more than INT_MAX elements, but theres no chance in my code that would ever happen. I want this .sizei() function only in the places where i know theres not even 10k elements in worst cases. I would use .size() if the code actually would make that possible to have over 2 billion elements. It is very rare case that you ever would go to over 2 billion elements... if i would need that many elements i wouldnt need vector functionality at all there... i would probably use static size array then. — Rookie, Jun 15 '11 at 17:42
@Rookie: anyway, regardless of whether you should want to do this or not, the short answer is that there's no valid way in C++ to add member functions to classes once they're defined, and you're not allowed to mess around in the `std` namespace. It's going to have to be a non-member function. — Steve Jessop, Jun 15 '11 at 17:47
Oh, and if you want to irritate the people telling you off, remind them that they shouldn't be using `size_t`, rather `std::vector::size_type` ;-) — Steve Jessop, Jun 15 '11 at 17:52
Sounds like `something` should have been an unsigned quantity in the first place. Fix that, not `std::vector`. — Lightness Races in Orbit, Jul 10 '11 at 16:25

Nawaz · Answer 1 · 2011-06-15T17:59:06.867

6

First of all, why would you want that? I don't see any reason, or advantage:

Anyway, you can do this:

template<typename T>
int size(const std::vector<T> &v) { return (int) v.size(); }

//use
std::vector<int> ints;
//...
int count = size(ints);

Still I don't see any point in doing that. You can simply write:

int count = (int) ints.size();

But I would still say its not better than the following :

size_t count = ints.size(); //don't prefer anything over this. Always use size_t

Advice: avoid using `int` for size. Prefer `size_t`.

As for the edit in your question. Why don't you use size_t as:

size_t something = 3;
if(arr.size() > something)

No warning. In my opinion, if you choose the data type consistently throughout your program, you wouldn't come across a situation when you've to compare int with size_t which is defined as unsigned integral type.

Or if there is some legacy code which you've to work with, and which use int for size, then I think its better to use explicit cast when you need it, instead of adding a function in the framework itself, which hides the potential problem:

int something = /*some legacy code API call or something */;
if(arr.size() > (size_t) something) 

//or even better;
size_t something = (size_t) /*some legacy code API call or something */;
if(arr.size() > something)

edited Jun 15 '11 at 17:59

answered Jun 15 '11 at 17:13

Nawaz

353,942
115
666
851

How is it better than just casting? That's not what he asked for. – littleadv Jun 15 '11 at 17:14
2

@littleadv I think it is exactly what he asked for: "a function that returns the .size() value as an integer" – Judge Maygarden Jun 15 '11 at 17:18
@littleadv: Who said the function is better than casting? And why is even casting any better? – Nawaz Jun 15 '11 at 17:21
2

`why would you want to do this?` It's not an uncommon requirement: Ex: You may have a function that returns the size of a particular named 'object' in some kind of structured storage. If the object is missing, -1 is returned, because we need to distinguish between missing and empty data. Also, missing data in these files is not sufficiently 'exceptional' to justify throwing). So, you often end up with code like `if (object.dataSize("Foo") == sizeof(MyStruct)) { readIt(); }`, which causes compiler warnings about comparing signed vs unsigned. – Roddy Jun 15 '11 at 17:25
@Roddy: You can still return `-1` with `size_t`. Its just that it will be equal to the value of `(size_t)-1`. (Note: its casting `-1` into `size_t` type). – Nawaz Jun 15 '11 at 17:27
separate function could work, i guess, but wouldnt it create bigger code since i have to overload every type with it ? why shouldnt i use `int` for the size() ? i dont have the need for over 2 billion items in the vector, never. and even if i do, i would then ofc just use .size() instead of the sizei() i wanted to implement. check my edits i explained better. – Rookie Jun 15 '11 at 17:28
@Nawaz. Good point. But your `dataSize` function may be in legacy code that's best left unchanged... – Roddy Jun 15 '11 at 17:29
@Judge : quote from the OP - *im not wanting to create a separate function, but a function in the std::vector itself.*. – littleadv Jun 15 '11 at 17:33
@littleadv: For your kind information, that is edited statement which you quoted. In the original question, it was not there. – Nawaz Jun 15 '11 at 17:36
@Nawaz - to me it was very obvious even from the title (not edited!), apparently the edit was required for you. – littleadv Jun 15 '11 at 17:37
FWIW, I'd probably write it `template int isize(const T &v) { return (int) v.size(); }`. If we truly need this for vector, I don't see why we won't need it for deque too and other containers. Possibly I'd use a different cast, `boost::numeric_cast` would seem prudent. – Steve Jessop Jun 15 '11 at 17:50
How do you get around with code such as: `for(size_t i = 3; i >= 0; i--);` ? this will produce infinite loop with unsigned types. – Rookie Jun 15 '11 at 18:05
@Rookie : You can simply write `for(size_t i = 3; i != -1 ; i--);`. Its not infinite loop anymore. – Nawaz Jun 15 '11 at 18:08
@Rookie: `for (size_t i = 3+1, i-- > 0;) { }` works for values of 3 smaller than `SIZE_MAX`. But the "real" fix as far as the design of the standard libraries is concerned, is to use iterators instead, since they can take whatever type they need to in order to support all necessary values including `begin()` and `end()`. – Steve Jessop Jun 15 '11 at 18:10
1

@Nawaz: OK, removed downvote. But I'm not voting up, because this idea of using unsigned types for *numbers*, is IMHO social group thinking at its very worst. It's like a religious belief. Every single person knows that it's insane, but also (incorrectly) thinks that most everybody else think it's how things are. And expresses outwardly the belief that he/she thinks the group has. But, there are no advantages, not even the mnemonic one often cited (like comments, but `typedef` covers that). And there are lots of extreme disadvantages. Umpteen zillions of wasted (wo)man-hours. – Cheers and hth. - Alf Jun 15 '11 at 18:12
1

I'm tempted to give a -1 for the bad advice about using `unsigned`. `unsigned`, in C++, doesn't work very well, and causes any number of problems. It's best to avoid it unless your doing bitwise operations or need it's modulo properties. (I don't know where this mania for using unsigned types comes from. I've not seen it in any of the better tutorial works.) – James Kanze Jun 15 '11 at 18:19
@Nawaz, yeah that would work, but it wouldnt be nice to read anymore, and converting to unsigned types like that will most probably result into bugs like that... i've done it, and i find it safer to just use ints instead, less problems, more readable. @Steve, as i said above... its really hard to see whats going in such codes. – Rookie Jun 15 '11 at 18:28
2

+1 for using `size_t` as a `vector` can't have a negative size (at least not in this universe) – Thomas Matthews Jun 15 '11 at 18:35
Hi guys, please don't use the commenting system as a chat room. It is for leaving a few comments and prods for more information to a question or answer, not for long debates. The reason behind this is that most of the time (and this is one of them), a lot if not all the comments belong as edits to the question/answer to make that more complete. If I have to read a half-page answer + 3 pages of comments, the focus on the comments is too big. Please edit in pertinent details into the answer instead. If you really need to chat, find/create a chat-room on the Chat site, link at the top of the page – Lasse V. Karlsen Jun 16 '11 at 11:48

Cheers and hth. - Alf · Accepted Answer · 2011-06-17T00:47:01.880

2

As a rule, in C and C++ you should never use an unsigned type such as size_t to restrict the domain. That's because (1) these languages provide no range checking, and (2) they do provide unreasonable implicit promotions. No range checking means (1) no advantage, and unreasonable implicit promotions means (2) very undesirable disadvantages, so it's plain stupid thing to do: no advantage, very undesirable disadvantages.

However, the standard libraries for these languages do that. They do it for historical reasons only, caught irreversibly in early decisions which at one time made sense. This has both extremely silly consequences such as C99 requiring 17 (!) bits for ptrdiff_t, and it has the aforementioned extremely undesirable consequences such as using inordinately much time on hunting down bugs resulting from implicit promotions (etc.). For example, in C++ you are practically guaranteed that std::string( "bah!" ).length() < -5 – which can easily trip you up and anyway is as silly as it is possible to design.

Now, you can't infuse new member functions in std::vector, but you can add a freestanding function. A good name is countOf. Template it so that it can be applied to just about anything (raw arrays, vectors, etc.).

The triad of functions startOf, endOf and countOf were, as far as I know, first identified by Dietmar Kuehl. C++0x will have std::begin and std::end, but AFAIK no corresponding std::size. In the meantime you can just define this support, which allows you to treat any kinds of container plus raw arrays the same.

An example implementation & further discussion is provided at my blog.

EDIT Adding some code, because it's requested in the comments.

Detection of suitable iterator type:

template< typename Type >
struct It
{
    typedef typename Type::iterator T;
};

template< typename Type >
struct It< Type const >
{
    typedef typename Type::const_iterator T;
};

template< typename ElemType, Size N >
struct It< ElemType[N] >
{
    typedef ElemType* T;
};

And the countOf, startOf and endOf functions, using that deduced iterator type:

template< typename T >
inline Size countOf( T const& c )           { return static_cast<Size>( c.size() ); }

template< typename T, Size N >
inline Size countOf( T (&)[N] )             { return N; }

template< typename T >
inline typename It<T>::T startOf( T& c )    { return c.begin(); }

template< typename T, Size N >
inline T* startOf( T (&a)[N] )              { return a; }

template< typename T >
inline typename It<T>::T endOf( T& c )      { return c.end(); }

template< typename T, Size N >
inline T* endOf( T (&a)[N] )                { return a + N; }

where Size is a typedef for ptrdiff_t.

Note: in 64-bit Windows int (and even long) is 32-bit. Hence, int is in general not sufficient for a really large array. ptrdiff_t is guaranteed to be able to represent the difference between any two pointers, when that difference is well-defined.

Cheers & hth.

edited Jun 17 '11 at 00:47

answered Jun 15 '11 at 17:54

Cheers and hth. - Alf

142,714
15
209
331

"early decisions which at one time made sense" - did they ever make sense? If there were no comparison operators (and possibly you'd need there to be no mixed arithmetic either, just explicit conversion) between signed and unsigned types, then I could sort of see the point of using unsigned types in the standard libraries. As things stand, though, you can't use only signed types, you can't use only unsigned types, and you can't safely mix them. Which as you say is a bit of a bind. – Steve Jessop Jun 15 '11 at 18:21
I think it's important that your types respect the fundamental properties of what they represent. It makes perfect sense, for example, to talk about the difference between two sizes, and the standard formula for that difference is `abs(size1, size2)`. If this doesn't work for the type you're using for sizes, then you're using the wrong type. – James Kanze Jun 15 '11 at 18:22
@James: but conversely, if the type you're using for indexes doesn't let you talk about a *signed* difference, then you're using the wrong type. So you can't use the same fixed-width type for both sizes and indexes, and you can't use a pair of corresponding signed/unsigned types because they don't fit, you can't have the signed one express signed differences in range of the unsigned one *and* the unsigned one express absolute differences in range of the signed one. Introduce a bigint builtin type to C and C++ and we could have sensible arithmetic, but we'd lose that close-to-the-metal vibe. – Steve Jessop Jun 15 '11 at 18:29
@Steve: the only concrete example I know of where the unsigned size type makes sense, is 16-bit x86 programming. Then the other options would be either 32-bit signed type, with some inefficiency incurred, or 16-bit signed type with string size limited to 32KB (where conceivably one might need e.g. a 48KB string!). This conundrum is reflected by the 17-bit C99 requirement for `ptrdiff_t`, which is signed and needs to be able to represent the difference between any two sizes. In earlier discussions about this, on Usenet, I've taken the stand that one could have different rules for e.g. embedded. – Cheers and hth. - Alf Jun 15 '11 at 18:32
@Alf: what about if `ptrdiff_t` didn't have to represent the difference between any two values of type `size_t`, just the difference between any two sizes of actual objects? Then you'd use a signed type for everything. Both kinds of difference work for real objects and pointers, they just don't work for the types (and hence James still doesn't get his, "types respect the fundamental properties of what they represent"). `distance(INT_MAX, INT_MIN)` is either undefined or defined to be an absurd value (wraparound), but happily doesn't represent a real distance. – Steve Jessop Jun 15 '11 at 18:41
Not that I'm claiming this can be retro-fitted into C now, so possibly the suggestion is irrelevant to the point, which is that C is borked. – Steve Jessop Jun 15 '11 at 18:43
Finally, a sensible answer! @Steve: we don't need bigint, just direct access to the "carry flag"; one extra bit is enough to resolve all addition/subtraction overflows. As for when these decisions made sense: in the days of K&R C when integer promotions were more sensible, i.e., (int)-2 < (unsigned)3 was true. – zvrba Jun 16 '11 at 06:11
@Steve I'm not sure I understand your comment (responding to me): my example used sizes, but it could have used indexes just as well. About the only difference is that it does make sense to talk about the signed difference for indexes (what you have to add to `i` in order to get `j`), although not for sizes. (In other words, be it sizes or indexes, you should normally use `int`.) – James Kanze Jun 16 '11 at 08:29
@James: In that case, I didn't understand your comment. `int` doesn't have the ability to express the (unsigned) distance between any two `int` values, whereas `unsigned int` does have the ability to express the distance between any two `unsigned int` values. So I thought you were advocating unsigned types for sizes. – Steve Jessop Jun 16 '11 at 09:25
@Steve All numeric types on a finite machine have limited range. `abs(a - b)` gives the correct difference between `a` and `b` as long as the results are in range, provided `a` and `b` are `int`. It will give a completely wrong result if they are `unsigned`: `abs(4U - 5U)` will give some horribly large number, and not 1. – James Kanze Jun 16 '11 at 10:29
@James: true, I didn't realise that what I thought was a hypothetical `abs(size1, size2)` function was in fact a typo... – Steve Jessop Jun 16 '11 at 10:32
can you suggest a way to write the template function for this? others have done it `template int sizei(const vector &arr)` but isnt it better to do just `template inline int sizei(const T &arr)` ? notice inline and how it should work with std::map's too since its not bound to just vector type. - also, any idea how much this will increase the executable size ? since i have lots of vector/map types in my code where i use this. – Rookie Jun 16 '11 at 11:40
@Rookie: see the link at the end of the answer. – Cheers and hth. - Alf Jun 16 '11 at 16:12
well i was more like asking you to put a simple answer in your answer instead of making people to surf on external websites... – Rookie Jun 16 '11 at 18:26
@Rookie: it would significantly increase the size of the answer. It's like, you don't put a car in the back of a van. Except if you're from Kazakhstan. – Cheers and hth. - Alf Jun 16 '11 at 19:05
I disagree with your claim that "in C and C++, you should never use an unsigned type such as size_t to restrict the domain". Types serve many purposes, one of them is documentation. Unsigned types give you the opportunity to document your intention that a certain variable should only take on positive values. I agree that it would be nicer if we also got static type checking out of unsigned types; but the fact that we don't is no reason not to use them. – HighCommander4 Mar 26 '12 at 02:55
@HighCommander: i understand your pov, and i'm sure that you will not immediately change it. but re self-documenting code, consider that since there is no built-in range checking in C++, the documentation effect is only via the suggestive name. and a suggestive name can be defined for any type, via a `typedef`. :-) – Cheers and hth. - Alf Mar 26 '12 at 08:50

score 1 · Answer 3 · edited May 23 '17 at 12:20

1

I would favor using an explicit cast to int instead of a function: static_cast<int> (v.size()). Even better would be to always use size_t when dealing with memory sizes. For example, favor for (size_t i=0; i < v.size(); ++i) over for (int i=0; i < (int) v.size(); ++i). Use the right type for the job. You should not be comparing std::vector sizes with a signed type.

See the following references for why you should prefer size_t to int:

edited May 23 '17 at 12:20

Community

1
1

answered Jun 15 '11 at 17:17

Judge Maygarden

26,961
9
82
99

Please take the vector by const reference in `vector_size` – Alexandre C. Jun 15 '11 at 17:30
it still remains unclear to me why exactly i shouldnt compare signed types to std::vector sizes... i've got through this signed/unsigned problems many times and i have learnt to just use ints for everything, in some rare cases i use unsigned ints when its REALLY needed. – Rookie Jun 15 '11 at 17:37
@Rookie: the reason not to do that comparison is that in the event your signed value is negative, it will (probably) get converted to a very large unsigned value, and the comparison will come out claiming that the vector is smaller when it's really bigger. If you know that your integer value truly will never be negative, you might as well use an unsigned type for it, since it makes it easier to use the standard libraries. C++ containers simply don't play nicely with your preference for signed types: the compiler warning is telling you to bridge the gap by explicitly converting. – Steve Jessop Jun 15 '11 at 17:57

score 1 · Answer 4 · answered Jun 15 '11 at 17:21

1

You can derive from vector as follows:

template<typename T>
class my_vector : public vector<T>
{
  // missing constructors!

  int size() const
  {
    if (vector<T>::size() > INT_MAX)
      throw std::range_error("too many elements in vector");
    return (int) vector<T>::size();
  }
};

The down-side is that you'll have to define and forward constructors yourself.

answered Jun 15 '11 at 17:21

zvrba

24,186
3
55
65

Missing constructors, missing virtual destructors... recipe for disaster. – rubenvb Jun 15 '11 at 17:23
I wrote that constructors were missing. As for virtual destructors, they will be inherited from std::vector, if there are any at all. – zvrba Jun 15 '11 at 17:24
@zvrba: there is no virtual destructor, and that is exactly the problem. If you `delete` your `my_vector` class from a pointer to base, you're screwed. And no, telling yourself not to do that is not going to prevent it from happening. – rubenvb Jun 15 '11 at 17:29
@zvrba, the standard containers don't have virtual destructors which means subclassing them can be dangerous. You might be able to get away with it when the subclass doesn't introduce any member variables of its own and thus has an empty destructor. – Mark Ransom Jun 15 '11 at 17:30
2

@rubenvb: "telling yourself not to do that is not going to prevent it from happening" I'm not sure why you'd dynamically allocate this class at all, but if you do then using `shared_ptr >(new my_vector(...))` would prevent anything bad from happening, since `shared_ptr` doesn't need a virtual destructor in order to get it right. And if not with `shared_ptr`, how are you managing these dynamically-allocated vectors? – Steve Jessop Jun 15 '11 at 17:41
@Steve: OK, didn't know that. I was just referring to eg [this discussion right here on SO](http://stackoverflow.com/questions/922248/is-there-any-real-risk-to-deriving-from-the-c-stl-containers) – rubenvb Jun 15 '11 at 17:49
5

@rubenvb: yeah, the claim "you cannot handle them polymorphically" is a bit misleading. There's one thing that you can't do with them, and that's polymorphically `delete`. But for containers, and for a derived class like like this that only adds a non-virtual function, it seems pretty unlikely to me that you'd use polymorphism at all, still less polymorphically delete. And `shared_ptr` can fake it for you. The real reason IMO not to do this is that it's futile to want everything to be a member function, when it could perfectly well be a non-member, but it's what the questioner asked for. – Steve Jessop Jun 15 '11 at 18:03
@Steve: (tiny nudge +1 comment upvote) Thanks for taking the time to explain. Always happy to learn new things. – rubenvb Jun 15 '11 at 18:35
@Mark: if you want safety, use something else than C++. – zvrba Jun 16 '11 at 05:59
@zvrba, it's not about safety - it's about being aware of the many corner cases in the language and how to use them properly. – Mark Ransom Jun 16 '11 at 13:04
@Mark: I call it "safety", you call it "corner cases". What's the actual difference, when violating most of corner-case rules ends up as undefined behavior? – zvrba Jun 17 '11 at 06:57

score 0 · Answer 5 · answered Jun 15 '11 at 17:38

Quick answer for .size() is: no. For vectors, the possibilities are its storage value and the alloc method (default new/delete, not normally overridden) along with methods that utilize InputIterator.

Most are going to ask why would you want a different size_t. If it's just the annoy warnings, you can cast or use unsigned integers to iterate/check against size(). (If it's a lot of code, you going to have to find/replace)... If it is handling empty conditions, you could wrap the vector in a class with some smarts. As an aside, since I don't know your problem at hand, a good place to look for ideas and already implemented features is std library's algorithms such as sort, for_each, find, and lots more.

For std algorithms, see: http://www.sgi.com/tech/stl/table_of_contents.html

I'm a slow typer :) ... but I did guess everybody would say WHY? I say go with Nawaz's answer — ginbot, Jun 15 '11 at 17:42

score 0 · Answer 6 · answered Jun 15 '11 at 17:58

While @Nawaz, in my opinion, provided the most appropriate answer, if you really want to add an additional member to std::vector<> it isn't really possible. @zvrba provided the only way that could be accomplished, but as stated in the comments there the std container types do not have virtual destructors and therefore are not meant to be subclassed from.

However, you could implement a new type of vector using a container adaptor, like this:

template <class T>
class my_vector
{
public:
   int size_i() const
   {
      return static_cast<int>(container_.size());
   }

private:
   std::vector<T> container_;
};

The drawback here is that you have to explicitly expose the functions of the container that you actually need to support. If you are using 'std::vector' normally throughout your code, this would likely be a significant change. See 'std::queue' for an implementation example of a container adaptor.

You could expose the contained vector by reference through a method. For example, `std::vector& operator()() { return container_; }` and then use it like so: `arr().push_back(foo);`. — Judge Maygarden, Jun 15 '11 at 18:40
Good point @Judge Maygarden, to take that a step further, you could potentially implement an implicit conversion operator to std::vector and continue to use my_vec in the code (mostly) unmodified. — Chad, Jun 15 '11 at 19:05

How to implement own function to std::vector?

6 Answers6

Advice: avoid using int for size. Prefer size_t.

Advice: avoid using `int` for size. Prefer `size_t`.