1

I've have been struggling with trying to figure out an issue I am getting if the last line of the column in a csv file is empty. What seems to happen is if the last column is empty, it either skips it or give incorrect data (not really sure yet). Been working on this for a few days now and am now looking for any ideas on resolving it.

My code that reads the csv file gathers the data and places it into a 2D vector. Here's the code for the first part:

bool valid = false;
std::string file = ParamsD.bulkUploadFile; //"Files\\BulkUpload.csv"
std::vector<std::vector <std::string>> buffer;      //buffer to store all the data read from the file
std::ifstream configFile;

configFile.exceptions(std::ifstream::badbit);

//Read the CSV file into a buffer
try
{
    std::string line;
    configFile.open(file.c_str(), std::ifstream::in);
    while(configFile.is_open())
    {
        if (!std::getline(configFile, line))
            break;
        std::istringstream ss(line);
        std::vector<std::string> record;
        while (ss)
        {
            std::string s;
                if (!std::getline(ss, s, ','))
                    break;
                record.push_back(s);
        }
        buffer.push_back(record);
    }
}
catch (std::ifstream::failure e)
{
    throw e;
    return false;
}

The second part of the function reads the buffer and then places the information into a struct which then is called for in other parts of the program. There is a lot of repeating so for keeping it easier to read and shorter I will so just parts of it..

for (int i = 0; i < buffer.size(); i++)
{
    for (int j = 0; j < buffer[i].size(); j++)
    {
        if (j == 0) //first column
        {
            std::string s;
            s = buffer[i][j];
            if (s.size() == 0)
                s = "NULL";
            CSVFile.passwordName.push_back(s);  
        } 
        //...if(j==1) through (j==27)...//
       if (j == 28) //Last column
        {
            std::string s;
            s = buffer[i][j];
            if (s.size() == 0)
                s = "NULL";
            CSVFile.extraPass3F.push_back(s);
        }
    }
}

valid = true;
return valid;

So as a workaround for the time being, I just place the word "NULL" in the last column and the code works as intended. Could the issue be that I am not handling "\n" when reading the line at "if(!std::getline(ss, s, ','))"?

Any help would be appreciated. Thanks in advance

mmetalfan
  • 23
  • 1
  • 6
  • It would help if you posted some of your input and describe the output - ideally for a case where it fails to perform as expected. – hnefatl Aug 07 '17 at 17:47
  • My input is formatted as...Password_name,Safe,SafeDescription,Password,PolicyID,Address,UserName,Type,LogonDomain,ADGroup,Comment,Description,Group,GroupPlatformID,CPMDisabled,DisabledReason,ResetImmediately,DSN,ClientDN,ServerDN,ExtraPass1Name,ExtraPass1Safe,ExtraPass1Folder,ExtraPass2Name,ExtraPass2Safe,ExtraPass2Folder,ExtraPass3Name,ExtraPass3Safe,ExtraPass3Folder Those are the headers for the columns. If I don't place "NULL" in the last column, the program crashes with error "Bad Allocation" – mmetalfan Aug 07 '17 at 17:48
  • The last field in a CSV record is not terminated by a comma, so this: `if (!std::getline(ss, s, ','))` doesn't do what you want. –  Aug 07 '17 at 17:51
  • So a check for "\n" will probably be needed? – mmetalfan Aug 07 '17 at 17:55
  • Writing a CSV parser is actually somewhat more complicated than it might at first seem. You should probably look at existing parsers to get some ideas of the problems - I have a simple one at https://bitbucket.org/neilb/csvparse/src –  Aug 07 '17 at 17:58

2 Answers2

0

The issue is that extracting from a stringstream isn't reaaaally the same as tokenising. The "" at the end of "a," is a valid token when your delimiter is ,, but a stringstream will consume the , and reach the end of the stream, and as there's no more data on the stream will report that it's out of data.

From here you can see that there doesn't seem to be a particularly nice built-in way of getting all the empty tokens.

Offering up a standard string split function:

std::vector<std::string> split(const std::string &in, const char delim)
{
    std::vector<std::string> results;
    std::string working;
    for (const char c : in)
    {
        if (c == delim)
        {
            results.push_back(working);
            working.clear();
        }
        else
            working.push_back(c);
    }
    results.push_back(working);
    return results;
}

This will return all the tokens (including empty ones), test it here.

You can then just read lines (getline, as you are doing) from the file (this will discard the newline at the end of each line as it's the delimiter) and pass them to this function, and you'll simply result in an empty string as a token. Egs.

split("a,", ',') == { "a", "" }
split("a,a", ',') == { "a", "a" }

Alternatively, std::getline sets the eof bit if it read to the end of the stream during its operation (see the table here). This means you could replace:

if (!std::getline(ss, s, ','))
    break;

with

if (!std::getline(ss, s, ','))
{
    if (ss.eof()) // At the end of the stream, insert a blank and move on
        record.push_back("");
    break;
}

Less elegant imo, but it is a smaller adjustment.

hnefatl
  • 5,860
  • 2
  • 27
  • 49
  • Your Suggestion worked as expected! That was what I needed. Added if (ss.eof()) // At the end of the stream, insert a blank and move on record.push_back(""); break; into the code fixed the issue. Thanks for all the fast comments everyone. – mmetalfan Aug 07 '17 at 18:57
0

In my previous answer I forgot that your ss will not contain \n, because it has already been removed by the first getline(configFile,line). How about you try std::istream::getline instead of std::getline on ss? I have not checked myself, but it should handle eof, so in this case you should also check if eofbit is set.

Piotr G
  • 959
  • 1
  • 7
  • 25
  • Both versions set an `eof` bit, but `std::istream::getline` operates on `char *`, not `std::string`. See [here](http://www.cplusplus.com/reference/string/string/getline/) versus [here](http://www.cplusplus.com/reference/istream/istream/getline/). The `eof` behaviour is a little less obvious on the `std::getline` version. – hnefatl Aug 07 '17 at 18:40