6

Sometimes I need to pass a C string to a function using the common C++ iterator range interface [first, last). Is there a standard C++ iterator class for those cases, or a standard way of doing it without having to copy the string or call strlen()?

EDIT: I know I can use a pointer as an iterator, but I would have to know where the string ends, what would require me to call strlen().

EDIT2: While I didn't know if such iterator is standardized, I certainly know it is possible. Responding to the sarcastic answers and comments, this is the stub (incomplete, untested):

class CStringIterator
{
public:
    CStringIterator(char *str=nullptr):
        ptr(str)
    {}

    bool operator==(const CStringIterator& other) const
    {
        if(other.ptr) {
            return ptr == other.ptr;
        } else {
            return !*ptr;
        }
    }

    /* ... operator++ and other iterator stuff */

private:
    char *ptr;
};

EDIT3: Specifically, I am interested in a forward iterator, because I want to avoid to iterate over the sring twice, when I know the algorithm will only have to do it once.

lvella
  • 12,754
  • 11
  • 54
  • 106
  • 1
    Is your c-string a `char*` or a `char[N]`? – NathanOliver Aug 09 '18 at 15:16
  • 1
    Raw pointers can be used as iterators – Slava Aug 09 '18 at 15:16
  • `char*`. To use it as an iterator, I'll have to know its size beforehand, which I don't unless I perform a O(n) operation (i.e. `strlen()`). – lvella Aug 09 '18 at 15:17
  • 3
    There is no magic C++ class that would deduce C style string length without `strlen()`, you either need to hold that value somewhere from the point the string is created or you have to find length – Slava Aug 09 '18 at 15:19
  • Yes, you need to use `strlen` - that's the downside of sticking with a C string instead of a C++ one. Unless it's a string *constant*, then you can use `sizeof`. – Mark Ransom Aug 09 '18 at 15:19
  • 1
    @Slava, no magic is needed, a simple iterator class returning `true` on `oprerato==` when the pointer points to `\0` would suffice. I want to know if such class is standardized, because it seems a common enough use case. – lvella Aug 09 '18 at 15:23
  • 3
    There's nothing standard, but the author of range-v3 [has written](http://ericniebler.com/2014/02/16/delimited-ranges/) on the topic. If you want, you can make a custom iterator where `end()` is actually a default-constructed one or something, comparing equal when the other is at the null terminator. There's definitely no *need* to use `strlen`. You could also consider using an intermediate `string_view` for the equivalent of `strlen` as a standard approach, which also allows things like range-for. – chris Aug 09 '18 at 15:23
  • 1
    @lvella that will not work for random access iterator, so you can create such one manually and make it forward iterator. No there is no standard one for this. – Slava Aug 09 '18 at 15:25
  • 2
    Btw you should edit your question and make it clearer - seems that nobody really understood what you want. – Slava Aug 09 '18 at 15:28
  • If all you have is a pointer to the first character of a string how can you avoid having to obtain its length? – Galik Aug 09 '18 at 15:28
  • @Galik OP means something like `std::ostream_iterator` – Slava Aug 09 '18 at 15:29
  • The first comment, from NathanOliver, is critical. "C Strings" can mean character pointers or character arrays, and there's a distinct answer for each. – Drew Dormann Aug 09 '18 at 15:29
  • @Slava Well its not very clear. Standard iterator interfaces accept pointers (which are iterators after all). – Galik Aug 09 '18 at 15:30
  • @Galik I know, everyone is confused, but from comment "a simple iterator class returning true on oprerato== when the pointer points to \0 would suffice" made that clear – Slava Aug 09 '18 at 15:30
  • @lvella Can you post your comments into the question itself to make it clearer? – Galik Aug 09 '18 at 15:32
  • 2
    You can probably use the one from GSL zstring_span, or roll your own. – n. m. could be an AI Aug 09 '18 at 15:43
  • @Slava From the iterator hierarchy I see [here](http://www.cplusplus.com/reference/iterator/), I don't see why it can't be a random access iterator. It only needs a special treatment to handle the end of string when you don't know its size. – lvella Aug 09 '18 at 15:51
  • Are you trying to remove your need to call strlen() (doable) or do you want a solution that avoid the runtime cost of strlen() (impossible) ? What would help is an edit where you show the desired usage of this special iterator. For example, std::sort with ptr to begginning of string and special iterator meaning "the end, when found". I can see a solution to this, but it would just hide the strlen, not make the solution more efficient. – Jeffrey Aug 09 '18 at 15:52
  • @lvella how about "Supports arithmetic operators + and -" ? – Slava Aug 09 '18 at 16:03
  • @Slava, if not NULL, add to/subtract from pointer. – lvella Aug 09 '18 at 16:07
  • 1
    @lvella how about `it1` - `it2` when `it1` is your synthetic end iterator? Or just `it1 - 1` ? – Slava Aug 09 '18 at 16:08
  • Actually such iterator can be bidirectional iterator not just forward one, one can also implement it as random_access but to invoke `strlen()` only when random access functionality is accessed from sentinel end.. – Slava Aug 09 '18 at 16:33
  • 1
    Btw I do not see answers/comments as sarcastic, you just confused everybody by making your question not quite clear, hense the answers. Actually your question is valid and quite good, but you did not formulate it properly. – Slava Aug 09 '18 at 16:37

9 Answers9

4

There isn't any explicit iterator class, but regular raw pointers are valid iterators as well. Problem with C-strings, though, is that they do not come with a native end iterator, which makes them unusable in range based for loops – directly at least...

You might like to try the following template, though:

template <typename T>
class Range
{
    T* b;
public:
    class Sentinel
    {
        friend class Range;
        Sentinel() { }
        friend bool operator!=(T* t, Sentinel) { return *t; }

    public:
        Sentinel(Sentinel const& o) { }

    };
    Range(T* begin)
            : b(begin)
    { }
    T* begin() { return b; }
    Sentinel end() { return Sentinel(); }
};

Usage:

for(auto c : Range<char const>("hello world"))
{
    std::cout << c << std::endl;
}

It originally was designed to iterate over null-terminated argv of main, but works with any pointer to null terminated array – which a C-string is as well...

Secret is comparing against the sentinel, which actually does a totally different comparison (current pointer pointing the terminating null (pointer))...

Edit: Pre-C++17 variant:

template <typename T>
class Range
{
    T* b;
public:
    class Wrapper
    {
        friend class Range;
        T* t;
        Wrapper(T* t) : t(t) { }
    public:
        Wrapper(Wrapper const& o) : t(o.t) { }
        Wrapper operator++() { ++t; return *this; }
        bool operator!=(Wrapper const& o) const { return *t; }
        T operator*() { return *t; }
    };
    Range(T* begin)
            : b(begin)
    { }
    Wrapper begin() { return Wrapper(b); }
    Wrapper end() { return Wrapper(nullptr); }
};
Aconcagua
  • 24,880
  • 4
  • 34
  • 59
  • This falls down for my favourite old-style for-loop form: `for( auto i = x.begin(), e = x.end(); i != e; ++i)`, because begin and end are not the same type. – Gem Taylor Aug 09 '18 at 15:47
  • @GemTaylor Since C++17 (14 already?), range based for loop does not require the iterator types to be equal, as long as they can be compared against each other. So you don't have to fall back... By the way, I have a pre-C++17 variant available somewhere as well... – Aconcagua Aug 09 '18 at 15:49
  • My largest concern would be that the end iterator is no longer random access. You can't do `*(x.end() - 1)`. – Max Langhof Aug 09 '18 at 15:50
  • @MaxLanghof Well, this is a pure forward iterator, just as in a singly linked list... – Aconcagua Aug 09 '18 at 15:55
  • @Aconcagua - cool! I was wondering why I couldn't compile the range variant on c++11, which is how the question is tagged. But I use the old-style for other reasons, when required. I'm currently trying to work out the minimal iterator object that could convert itself into a safe strlen() if used for random access or if end was decremented, but avoids the strlen call until required. – Gem Taylor Aug 09 '18 at 15:56
  • @Aconcagua but if it were tagged as a forward-only iterator some added safety would be gained. Sadly, as a raw pointer it can't be tagged. – Gem Taylor Aug 09 '18 at 15:57
  • Added the pre-C++17 variant as well. The wrapper might even be tagged, so it might be your choice (probably better name it `iterator`). I'll be leaving this to you... – Aconcagua Aug 09 '18 at 16:01
  • The second variant might even be usable in std algorithm library with those algorithms that are satisfied with forward-only iterators (which is not possible with the first variant as iterator types differ). – Aconcagua Aug 09 '18 at 16:03
  • I would prefer the c++11 version actually checked that the second iterator was "end" before doing the nullchar check and otherwise compared to it. Note here I am thinking library code where the user is not aware of the implementation details and limitations. – Gem Taylor Aug 09 '18 at 16:24
  • @GemTaylor That's really up to you... Both versions arose from just playing around, if it were about null-terminated pointers, I'd iterate via classic loop anyway (`X* p; /* = ... */ for(;*p;++p)`) (see [examples on argv](https://stackoverflow.com/a/36012497/1312382)) – Aconcagua Aug 09 '18 at 16:28
  • @Aconcagua `this is a pure forward iterator` your classes are not (standard compatible) iterators at all, as they don't meet the requirements. I don't think they're guaranteed to work with any standard algorithms. They're fine to use with the range-for-loop though, I think. – eerorika Aug 10 '18 at 21:49
  • @user2079303 Range based for loop is what they originally where designed for. The second variant (not the first) would work as well with e. g. std::find, std::find_if, std::for_each. They wouldn't with std::sort, for instance, as not random access iterators... – Aconcagua Aug 11 '18 at 00:41
  • @Aconcagua The second variant is not an iterator (either). Those algorithms may work if their implementation doesn't rely on the requirements that iterators must have. But the algorithms *may* rely on those requirements in which case they won't work. – eerorika Aug 11 '18 at 01:02
4

Actually, yes - sort of. In c++17.

C++17 introduces std::string_view which can be constructed from a c-style string.

std::string_view is a random access (proxy) container which of course fully supports iterators.

Note that although constructing a string_view from a const char* will theoretically call std::strlen, the compiler is allowed to (and gcc certainly does) elide the call when it knows the length of the string at compile time.

Example:

#include <string_view>
#include <iostream>

template<class Pointer>
struct pointer_span
{
    using iterator = Pointer;

    pointer_span(iterator first, std::size_t size)
    : begin_(first)
    , end_(first + size)
    {
    }

    iterator begin() const { return begin_; }
    iterator end() const { return end_; }

    iterator begin_, end_;
};

int main(int argc, char** argv)
{
    for(auto&& ztr : pointer_span(argv, argc))
    {
        const char* sep = "";
        for (auto ch : std::string_view(ztr))
        {
            std::cout << sep << ch;
            sep = " ";
        }
        std::cout << std::endl;
    }
}

See the example output here

Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • I do not think this is what OP wants. She asks for something like `std::istream_iterator` with sentinel end so algorithms which requires forward or bidirectional iterator would not have to scan the string twice. – Slava Aug 09 '18 at 16:42
  • @Slava there is already an answer to this question which proposes a solution like this. I note the 3rd edit in the OP's question about avoiding traversing the string twice. I have mentioned the possibility of elision of the first traverse. This answer is posted for the sake of complete information. – Richard Hodges Aug 09 '18 at 17:13
3

Is there a standard C++ iterator for C strings?

Yes. A pointer is an iterator for an array. C strings are (null terminated) arrays of char. Therefore char* is an iterator for a C string.

... using the common C++ iterator range interface [first, last)

Just like with all other iterators, to have a range, you need to have an end iterator.

If you know or can assume that an array fully contains the string and nothing more, then you can get the iterator range in constant time using std::begin(arr) (std::begin is redundant for C arrays which decay to the pointer anyway, but nice for symmetry) and std::end(arr) - 1. Otherwise you can use pointer arithmetic with offsets within the array.

A little bit of care must be taken to account for the null terminator. One must remember that the full range of the array contains the null terminator of the string. If you want the iterator range to represent the string without the terminator, then subtract one from the end iterator of the array, which explains the subtraction in the previous paragraph.

If you don't have an array, but only a pointer - the begin iterator - you can get the end iterator by advancing the beginning by the length of the string. This advancement is a constant operation, because pointers are random access iterators. If you don't know the length, you can call std::strlen to find out (which isn't a constant operation).


Example, std::sort accepts a range of iterators. You can sort a C string like this:

char str[] = "Hello World!";
std::sort(std::begin(str), std::end(str) - 1);
for(char c : "test"); // range-for-loops work as well, but this includes NUL

In the case you don't know the length of the string:

char *str = get_me_some_string();
std::sort(str, str + std::strlen(str));

Specifically, I am interested in a forward iterator

A pointer is a random access iterator. All random access iterators are also forward iterators. A pointer meets all of the requirements listed in the linked iterator concept.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Or I need an iterator class whose `oprerato==()` returns `true` at the end of the string. – lvella Aug 09 '18 at 15:24
  • The equality operator returns true at the end of the string when you compare it to the end iterator. – eerorika Aug 09 '18 at 15:31
  • @user2079303 Which is why you would use a sentinel as `end` iterator. Comparison with it would check whether the current character is `\0`. – Max Langhof Aug 09 '18 at 15:36
3

It is possible to write such iterator, something like this should work:

struct csforward_iterator : 
    std::iterator<std::bidirectional_iterator_tag, const char, void> {

    csforward_iterator( pointer ptr = nullptr ) : p( ptr ) {}

    csforward_iterator& operator++()  { ++p; return *this; }
    csforward_iterator operator++(int) { auto t = *this; ++p; return t; }

    csforward_iterator& operator--()  { --p; return *this; }
    csforward_iterator operator--(int) { auto t = *this; --p; return t; }

    bool operator==( csforward_iterator o ) { 
        return p == o.p or ( p ? not ( o.p or *p ) : not *o.p ); 
    }
    bool operator!=( csforward_iterator o ) { return not operator==( o ); }

    void swap( csforward_iterator &o ) { std::swap( p, o.p ); }

    reference operator*() const { return *p; }
    pointer operator->() const { return p; }
private:
    pointer p;
};

live example

though unfortunately standard one is not provided and it probably would be template over char type (like std::string ).

Slava
  • 43,454
  • 1
  • 47
  • 90
  • Note that if the reason for this iterator is to "avoid iterating the range twice" as an optimization, this isn't necessarily at all faster, since there are potentially three checks per increment (as opposed to two checks in the case of two iterations over the range using a single check each). Of course, an iterator like this can be useful for purposes other than optimization. – eerorika Aug 10 '18 at 00:07
  • However, unless I'm mistaken, I don't think this implementation meets the requirements of an InputIterator (which all BidirectionalIterators are). In particular, this requirement (slightly altered for context): `If i == j and (i, j) is in the domain of == then *i is equivalent to *j.` – eerorika Aug 10 '18 at 00:08
  • @user2079303 I slightly changed implementation of `==` this should cover cases when both iterators are not sentinel end. – Slava Aug 10 '18 at 00:52
  • Seems good. Few requirements still missing from (Input)Iteartor status: arrow operator and swap. – eerorika Aug 10 '18 at 21:42
  • @user2079303 added swap and arrow and fixed ==, thanks – Slava Aug 11 '18 at 00:32
  • A member swap is not sufficient (nor necessary) for Swappable. You need a free function :) – eerorika Aug 11 '18 at 00:59
1

I'm afraid not, for last you'll need a pointer to the end of the string for which you'll need to call strlen.

Tom
  • 43,583
  • 4
  • 41
  • 61
  • 1
    If you can assume null-terminated strings, your answer is incorrect. To know whether you are at the end of the string you would only have to check the current character. – Max Langhof Aug 09 '18 at 15:35
  • The question implies they're normal null terminated c strings. For c++ iterators it's a comparison of the current iterator to the end iterator not checking a current iterator if it's at the end - so character check isn't appropriate. – Tom Aug 09 '18 at 15:41
  • 1
    You can do a character check in the comparison of two iterators. See the answer by Aconcagua. – Max Langhof Aug 09 '18 at 15:43
  • The question is specifically about the existence of a standard iterator for it, that is why I am accepting this one. – lvella Aug 09 '18 at 15:52
1

If you have a string literal, you can get the end iterator without using std::strlen. If you have only a char*, you'll have to write your own iterator class or rely on std::strlen to get the end iterator.

Demonstrative code for string literals:

#include <iostream>
#include <utility>

template <typename T, size_t N>
std::pair<T*, T*> array_iterators(T (&a)[N]) { return std::make_pair(&a[0], &a[0]+N); }

int main()
{
   auto iterators = array_iterators("This is a string.");

   // The second of the iterators points one character past the terminating
   // null character. To iterate over the characters of the string, we need to 
   // stop at the terminating null character.

   for ( auto it = iterators.first; it != iterators.second-1; ++it )
   {
      std::cout << *it << std::endl;
   }
}
R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • I think it would be better to "fix" .second in the template, especially if you start using this pair as a lightweight string_view – Gem Taylor Aug 09 '18 at 15:50
  • @GemTaylor, I thought about it but decided against it. One could potentially use the function with an array of `char`s that may have any number of null characters, including none. – R Sahu Aug 09 '18 at 15:54
  • True. The fix could check if the previous character was null. If there are multiple terminator nulls, and it is important not to visit them, then you are sunk with this approach. – Gem Taylor Aug 09 '18 at 16:00
  • " If you have only a char*, you cannot get the end iterator without using std::strlen." this is not quite true, one can implement forward iterator like `std::ostream_iterator` that does not need to know length – Slava Aug 09 '18 at 16:05
  • @Slava, true. Updated the answer. – R Sahu Aug 09 '18 at 16:07
  • Right and "you'll have to write your own iterator class" is the key question here and most of want OP wants, not just mention it is possible to implement IMHO. – Slava Aug 09 '18 at 16:44
1

For ultimate safety and flexibility, you end up wrapping the iterator, and it has to carry some state.

Issues include:

  • random access - which can be addressed in a wrapped pointer by limiting its overloads to block random access, or by making it strlen() on need
  • multiple iterators - when comparing with each other, not end
  • decrementing end - which you could again "fix" by limiting the overloads
  • begin() and end() need to be same type - in c++11 and some api calls.
  • a non-const iterator could add or remove content

Note that it is "not the iterator's problem" if it is randomly seeked outside the range of the container, and it can legally seek past a string_view.end(). It is also fairly standard that such a broken iterator could not then increment to end() any more.

The most painful of these conditions is that end can be decremented, or subtracted, and dereferenced (usually you can't, but for string it is a null character). This means the end object needs a flag that it is the end, and the address of the start, so that it can find the actual end using strlen() if either of these operations occurs.

Gem Taylor
  • 5,381
  • 1
  • 9
  • 27
0

Is there a standard C++ iterator class for those cases, or a standard way of doing it without having to copy the string

Iterators are a generalization of pointers. Specifically, they're designed so that pointers are valid iterators.

Note the pointer specializations of std::iterator_traits.

I know I can use a pointer as an iterator, but I would have to know where the string ends

Unless you have some other way to know where the string ends, calling strlen is the best you can do. If there were a magic iterator wrapper, it would also have to call strlen.

Useless
  • 64,155
  • 6
  • 88
  • 132
0

Sorry, an iterator is something that is normally obtained from an iterable instance. As char * is a basic type and not a class anymore. How do you think something like .begin() or .end(), can be achieved.

By the way, if you need to iterate a char *p knowing it is nul terminated. you just can do the following.

for( char *p = your_string; *p; ++p ) {
    ...
}

but the thing is that you cannot use iterators as they are defined in C++, because char * is a basic type, has no constructor, has no destructor or methods associated.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31