1

Given data format as "int,int,...,int,string,int", is it possible to use stringstream (only) to properly decode the fields?

[Code]

int main(int c, char** v)
{
    std::string line = "0,1,2,3,4,5,CT_O,6";
    char delimiter[7];
    int id, ag, lid, cid, fid, did, j = -12345;
    char dcontact[4]; // <- The size of <string-field> is known and fixed
    std::stringstream ssline(line);
    ssline >> id >> delimiter[0]
    >> ag >> delimiter[1]
    >> lid >> delimiter[2]
    >> cid >> delimiter[3]
    >> fid >> delimiter[4]
    >> did >> delimiter[5]  // <- should I do something here?
    >> dcontact >> delimiter[6]
    >> j;
    std::cout << id << ":" << ag << ":" << lid << ":" << cid << ":" << fid << ":" << did << ":";
    std::cout << dcontact << "\n";
}

[Output] 0:1:2:3:4:5:CT_6,0:-45689, the bolded part shows the stringstream failed to read 4 char only to dcontact. dcontact actually hold more than 4 chars, leaving j with garbage data.

Caesar
  • 9,483
  • 8
  • 40
  • 66
YamHon.CHAN
  • 866
  • 4
  • 20
  • 36

5 Answers5

1

Yes, there is no specific overload of operator >> (istream&, char[N]) for N and there is for char* so it sees that as the best match. The overload for char* reads to the next whitespace character so it doesn't stop at the comma.

You could wrap your dcontact in a struct and have a specific overload to read into your struct. Else you could use read, albeit it breaks your lovely chain of >> operators.

ssline.read( dcontact, 4 );

will work at that point.

To read up to a delimiter, incidentally, you can use getline. (get will also work but getline free-function writing to a std::string will mean you don't have to guess the length).

(Note that other people have specified to use get rather than read, but this will fail in your case as you do not have an extra byte at the end of your dcontact array for a null terminator. IF you want dcontact to be null-terminated then make it 5 characters and use 'get` and the null will be appended for you).

CashCow
  • 30,981
  • 5
  • 61
  • 92
1

Slightly more robust (handles the ',' delimiter correctly):

template <char D>
std::istream& delim(std::istream& in)
{
  char c;
  if (in >> c && c != D) in.setstate(std::ios_base::failbit);
  return in;
}

int main()
{
  std::string line = "0,1,2,3,4,5,CT_O,6";
  int id, ag, lid, cid, fid, did, j = -12345;
  char dcontact[5]; // <- The size of <string-field> is known and fixed
  std::stringstream ssline(line);
  (ssline >> id >> delim<','>
          >> ag >> delim<','>
          >> lid >> delim<','>
          >> cid >> delim<','>
          >> fid >> delim<','>
          >> did >> delim<','> >> std::ws
          ).get(dcontact, 5, ',') >> delim<','>
          >> j;
  std::cout << id << ":" << ag << ":" << lid << ":"
            << cid << ":" << fid << ":" << did << ":";
            << dcontact << "\n";
}
ipc
  • 8,045
  • 29
  • 33
  • Neet use of a manipulator to handle the separators, but note that `istream::get` doesn't skip white space, regardless of the `skipws` flag. Unlike `>>`, which is used for everything else. So the input will fail if it has `"...5, CT_0, 6"`. – James Kanze Jan 03 '13 at 14:35
  • +1 because you remembered to fix dcontact to be 5 characters to use get. Of course you assume that it is intended to be a string and thus have such a terminator. – CashCow Jan 03 '13 at 17:12
0

try this

  int main(int c, char** v) {
    string line = "0,1,2,3,4,5,CT_O,6";
    char delimiter[7];
    int id, ag, lid, cid, fid, did, j = -12345;
    char dcontact[5]; // <- The size of <string-field> is known and fixed

    stringstream ssline(line);

    ssline >> id >> delimiter[0]
            >> ag >> delimiter[1]
            >> lid >> delimiter[2]
            >> cid >> delimiter[3]
            >> fid >> delimiter[4]
            >> did >> delimiter[5];

    ssline.get(dcontact, 5);

    ssline >> delimiter[6]
            >> j;
    std::cout << id << ":" << ag << ":" << lid << ":" << cid << ":" << fid << ":" << did << ":";
    std::cout << dcontact << "\n" << j;
    }
Khaledvic
  • 534
  • 4
  • 16
  • 1
    If you use get you need to edit dcontact to make it 5 characters or it will overflow. get adds a null terminator for you. read does not. – CashCow Jan 03 '13 at 17:11
0

The problem is that the >> operator for a string (std::string or a C style string) actually implements the semantics for a word, with a particular definition of word. The decision is arbitrary (I would have made it a line), but since a string can represent many different things, they had to choose something.

The solution, in general, is not to use >> on a string, ever. Define the class you want (here, probably something like Symbol), and define an operator >> for it which respects its semantics. You're code will be a lot clearer for it, and you can add various invarant controls as appropriate. If you know that the field is always exactly four characters, you can do something simple like:

class DContactSymbol
{
    char myName[ 4 ];
public:
    //  ...
    friend std::istream&
    operator>>( std::istream& source, DContactSymbol& dest );
    //  ...
};

std::istream&
operator>>( std::istream& source, DContactSymbol& dest )
{
    std::sentry guard( source );
    if ( source ) {
        std::string tmp;
        std::streambuf* sb = source.rdbuf();
        int ch = sb->sgetc();
        while ( source && (isalnum( ch ) || ch == '_') ) {
            tmp += static_cast< char >( ch );
            if ( tmp.size() > sizeof( dest.myName ) ) {
                source.setstate( std::ios_base::failbit );
            }
        }
        if ( ch == source::traits_type::eof() ) {
            source.setstate( std::ios_base::eofbit );
        }
        if ( tmp.size() != sizeof( dest.myName ) ) {
            source.setstate( std::ios_base::failbit );
        }
        if ( source ) {
            tmp.copy( dest.myName, sizeof( dest.myName ) );
        }
    }
    return source;
}

(Note that unlike some of the other suggestions, for example using std::istream::read, this one maintains all of the usual conventions, like skipping leading white space dependent on the skipws flag.)

Of course, if you can't guarantee 100% that the symbol will always be 4 characters, you should use std::string for it, and modify the >> operator accordingly.

And BTW, you seem to want to read four characters into dcontact, although it's only large enough for three (since >> will insert a terminating '\0'). If you read any more than three into it, you have undefined behavior.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • You can read 4 characters into dcontact safely, it will only be an issue if you treat it as a null-terminated string because it isn't. read() will put the next 4 bytes from the stream into dcontact, if there are that many left in the stream. You are right that it won't skip whitespace or interpret delimiters. It is just a "dumb" copy. Sometimes that is what you want - for it to just give you what's there and not try to work out what it thinks you want. Assuming your program writes the file, you should know what's in it. – CashCow Jan 04 '13 at 09:46
  • It's a text file. You can't assume anything about it. Even if your program wrote it, someone could have edited it since then. And since he likely needs a class for the type elsewhere, defining a `>>` on it would seem the most natural solution. – James Kanze Jan 04 '13 at 13:50
0

Since the length of the string is known you can use std::setw(4), as in

ssline >> std::setw(4) >> dcontact >> delimiter[6];
bames53
  • 86,085
  • 15
  • 179
  • 244