0

I am having a little problem file processing a csv file. I am very new to C++ and trying to learn. It's probably a little thing I am overlooking but I have searched for answers online and cannot figure out where I am going wrong. I am trying to process a file that has multiple lines and comma separated values (no comma at the end of the line though if that makes a difference) -- of note, when I tried to post the text just now it did not include the paragraph breaks, I had to add that in manually-- not sure if that makes a difference

Sale,11/9/14,11/9/14,AMAZON MKTPLACE PMTS,-8.99

Sale,10/4/14,10/5/14,AMAZON MKTPLACE PMTS,-13.08

Sale,10/3/14,10/5/14,AMAZON MKTPLACE PMTS,-9.82

Sale,10/2/14,10/3/14,AMAZON MKTPLACE PMTS,-45.48

Sale,8/21/14,8/22/14,AMAZON MKTPLACE PMTS,-9.99

Sale,11/8/14,11/9/14,Amazon.com,-64.7

Sale,10/1/14,10/2/14,APL* ITUNES.COM/BILL,-1.08

Sale,9/15/14,9/16/14,APL* ITUNES.COM/BILL,-1.08

I tried using getline to get each line into a stringstream then parse out each of those lines by the comma delimiter using the code below:

ifstream file("test1.csv"); 
string value, line;
while (getline(file, line)) {
    stringstream   linestream(line);
    while (getline(linestream, value, ',')) {
        cout << "Value:   " << value << endl;
    } // while
    cout << "Done Procesing" << endl;
} // while

The problem I am getting is that for some weird reason after every 5th token of the comma delimited processing the word “Sale” overwrites the word Value and I cannot understand why. Would really appreciate some guidance.

David G
  • 94,763
  • 41
  • 167
  • 253
newgirl81
  • 1
  • 1
  • In the statement `while(getline(linestream,value,','))` you specify the delimiter to be a comma... there's no comma at the end of your line. – druckermanly Nov 25 '14 at 02:46
  • 4
    The input file probably contains DOS-style line-endings, which consist of a `\r\n` sequence. `getline()` reads the `\r` into `line`, and so the last `value` on each line includes the carriage return. – Jonathan Wakely Nov 25 '14 at 02:47
  • @user2899162, so it will read to EOF which in this case is the end of the current line, because `linestream` only contains a single line. – Jonathan Wakely Nov 25 '14 at 02:47
  • Whoops! Didn't type out my whole theory-- and if the file is written with a different end of line identifier, the end of the line might not be where we think it is. EDIT: I was way beaten to this. – druckermanly Nov 25 '14 at 02:48
  • @JonathanWakely this'll only happen if she's running in Unix (or unix-style environment); otherwise the environment should translate the endings – M.M Nov 25 '14 at 02:58
  • Trim the value before printing. This answer has a some elegant trim methods in C++. http://stackoverflow.com/a/217605/368818 – Asela Nov 25 '14 at 03:44
  • 1
    @JonathanWakely: If the word `Value:` gets overwritten the line end sequence clearly is a `\n\r` pair. Otherwise the `\r` would cause an invisible carriage return. – Dietmar Kühl Nov 25 '14 at 04:48

1 Answers1

2

Based on the description (but not visible in the text quoted) each line begins with a '\r' (carriage return) character. Some systems uses an end of line sequence. Windows typically uses "\r\n" (carriage return, line feed) which would be replaced by a single '\n' when opening the file in non-binary mode (i.e., when not passing the flag std::ios_base::binary when creating the stream). However, this replacement does not happen for a "\n\r" sequence.

You can easily verify this theory by replacing all '\r' characters before creating the std::istringstream (I slipped an extra i in there as I don't see why a read/write stream should be created):

std::transform(line.begin(), line.end(), line.begin(), '\r', '@');
std::istringstream linestream(line);

With this change I would expect that the output of the first word of all but the first line would look like this:

Value:   @Sale

The easiest way to fix the problem is to simply skip leading whitespace when reading the line. The corresponding code excerpt would look like this:

std::ifstream file("test1.csv"); 
for (std::string line; std::getline(file << std::ws, line); ) {
    std::istringstream   linestream(line);
    for (std::string value; std::getline(linestream, value, ','); ) {
        std::cout << "Value:   " << value << '\n';
    } // for
}
std::cout << "Done Procesing\n";

The magic is the addition of << std::ws when reading the line which simply removed all leading whitespace. The code also removes the inappropriate use of std::endl. If the first word on each line may include leading whitespace you would need a different approach, probably removing the '\r' characters before creating linestream, e.g., using

line.erase(std::remove(line.begin(), line.end(), '\r'), line.end());
Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380