2

This code is to parse a csv file, but it results in segmentation fault. I could see the similar code here: Nested strtok function problem in C

They look same but my code results in segmentation fault. Why?

#include <cstdlib>
#include <string.h>
#include <stdio.h>
#include <vector>

using namespace std;

struct inputTuple{
    int user, item, rating;
};

int main(void)
{
    char xdata[] = "1,88,0,874965758;1,2,1,876893171;1,99,1,878542960;";
    vector<inputTuple> input;

            int maxUser = 0, maxItem = 0;
        int user, item, rating;

        char *end_str;
        char *data_point = strtok_r(xdata, ";", &end_str);

        while(data_point != NULL) {
            char *end_attr;
            char *data_point_attr = strtok_r(data_point, ",", &end_attr);
            while(data_point_attr != NULL) {

                user = atoi(data_point_attr);
                data_point_attr = strtok_r(NULL, ",", &end_attr);
                item = atoi(data_point_attr);
                data_point_attr = strtok_r(NULL, ",", &end_attr);
                rating = atoi(data_point_attr);
                strtok_r(NULL, ",", &end_attr);
                input.push_back({user, item, rating});
                maxUser = max(maxUser, user);
                maxItem = max(maxItem, item);
                }
            data_point = strtok_r(NULL, ";", &end_str);
        }
    return 0;
}
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
Kumar Roshan Mehta
  • 3,078
  • 2
  • 27
  • 50

3 Answers3

2

Jean-Francois is correct. Just check the value before calling atoi like below. You need to do that every time you are calling atoi

if(data_point_attr != NULL)
    user = atoi(data_point_attr);
Abhijit Pritam Dutta
  • 5,521
  • 2
  • 11
  • 17
2

your inner loop is not needed AND toxic to your program.

The inner loop extracts 3 tokens, then continues, extract the 4th token but reaches NULL in the middle, and atoi is passed NULL: segfault.

You only need one loop (added assert statements for sanity though):

    while(data_point != NULL) {
        char *end_attr;
        char *data_point_attr = strtok_r(data_point, ",", &end_attr);

            user = atoi(data_point_attr);
            data_point_attr = strtok_r(NULL, ",", &end_attr);
            assert(data_point_attr != NULL);
            item = atoi(data_point_attr);
            data_point_attr = strtok_r(NULL, ",", &end_attr);
            assert(data_point_attr != NULL);
            rating = atoi(data_point_attr);
            strtok_r(NULL, ",", &end_attr);
            input.push_back({user, item, rating});
            maxUser = max(maxUser, user);
            maxItem = max(maxItem, item);

        data_point = strtok_r(NULL, ";", &end_str);
    }

A better solution would be to use a real C++ solution to split the strings.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
2

According to Valgrind, your program is failing here:

                item = atoi(data_point_attr);

, as a result of the function argument being NULL. This seems to be consistent with Jean-François's prediction. A bit of debugging with gdb shows that it occurs on the second loop iteration, though you could have determined that even via print-statement debugging. And just a bit more debugging with gdb shows what's going on: your inner loop is faulty, because the loop condition does not become false when you reach the end of a record.

And what do you need that inner loop for, anyway? Nothing. You are reading each value via a separate statement, so you've effectively unrolled the loop vs. what one would need in a program to consume arbitrary CSV. Do check the result of each strtok_r() call, but drop the unneeded inner loop.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157