3

I have a .csv file that reads like:

SKU,Plant,Qty
40000,ca56,1245
40000,ca81,12553.3
40000,ca82,125.3
45000,ca62,0
45000,ca71,3
45000,ca78,54.9

Note: This is my example but in reality this has about 500,000 rows and 3 columns.

I am trying to convert these entries into a 2D array so that I can then manipulate the data. You'll notice that in my example I just set a small 10x10 matrix A to try and get this example to work before moving on to the real thing.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

const char *getfield(char *line, int num);

int main() {
    FILE *stream = fopen("input/input.csv", "r");
    char line[1000000];
    int A[10][10];
    int i, j = 0;

    //Zero matrix
    for (i = 0; i < 10; i++) {
        for (j = 0; j < 10; j++) {
            A[i][j] = 0;
        }
    }

    for (i = 0; fgets(line, 1000000, stream); i++) {
        while (j < 10) {
            char *tmp = strdup(line);
            A[i][j] = getfield(tmp, j);
            free(tmp);
            j++;
        }
    }
    //print matrix
    for (i = 0; i < 10; i++) {
        for (j = 0; j < 10; j++) {
            printf("%s\t", A[i][j]);
        }
        printf("\n");
    }
}

const char *getfield(char *line, int num) {
    const char *tok;
    for (tok = strtok(line, ",");
         tok && *tok;
         tok = strtok(NULL, ",\n"))
    {
        if (!--num)
            return tok;
    }
    return 0;
}

It prints only "null" errors, and it is my belief that I am making a mistake related to pointers on this line: A[i][j] = getfield(tmp, j). I'm just not really sure how to fix that.

This is work that is based almost entirely on this question: Read .CSV file in C . Any help in adapting this would be very much appreciated as it's been a couple years since I last touched C or external files.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Matthew R
  • 67
  • 1
  • 6
  • 1) `int A[10][10];` --> `const char *A[10][10];` – BLUEPIXY Apr 26 '17 at 21:50
  • 1
    `getfield()` doesn't make any copy, it just chops up and returns a part of `tmp`. Then you assign that part of `tmp` to a location in the matrix, then you free `tmp`. (which as @BLUEPIXY pointed out isn't even defined to hold strings) If you're going to store pointers in `A`, you'll have to keep them allocated till you're done with them for starters. (I haven't read past this line of code yet, so likely more to come...) – ebyrob Apr 26 '17 at 21:52
  • 3) `if (!--num)` --> `if (!num--)` – BLUEPIXY Apr 26 '17 at 22:01
  • 4) You need reset `j`. Like `j = 0; while(j<10){` or Use `for` – BLUEPIXY Apr 26 '17 at 22:01
  • Next, `getfield()` appears to quit processing correctly after the first call since the string is already chopped up. And, as @BLUEPIXY points out this time: `j` is not reset at all, so the double loop will only execute the `i` portion once. On top of that, `j` is not in any way related to how many tokens are found, so it'll just go to `9`, finding nothing after the first row. (and only possibly 1 item in that first row due to the re-entry problem in `getfield()`). Your question seems more of the form "how do I do this?" than "what is wrong?", so I'll look for something, but CSV is tough. – ebyrob Apr 26 '17 at 22:08
  • You're right definitely more of how do I this question. I've never worked with csv before outside of excel. By incorporating your changes I'm not at the point where the output has "aGd" in the first column, "d" in the second, and 54.9 in the last – Matthew R Apr 26 '17 at 22:15
  • [DEMO](http://ideone.com/9FhKL6) – BLUEPIXY Apr 26 '17 at 22:49
  • @BLUEPIXY shouldn't that be an "answer"? And looks like you win the foot race. – ebyrob Apr 26 '17 at 23:04
  • 1
    @ebyrob Basically these questions are mostly off-topics. – BLUEPIXY Apr 26 '17 at 23:06
  • Works like a charm! Thank you for the help! – Matthew R Apr 26 '17 at 23:10
  • Do I need to award you the answer or do you need to submit as an answer first? – Matthew R Apr 26 '17 at 23:14
  • @MatthewR Just remember that CSV parsing is tough. You may have commas in the data, or quotes, or commented out lines, or no header, or some other character set than what you expect. – ebyrob Apr 26 '17 at 23:27
  • Will watch out for that, thanks – Matthew R Apr 27 '17 at 15:41

2 Answers2

1

It looks like commenters have already helped you find a few errors in your code. However, the problems are pretty entrenched. One of the biggest issues is that you're using strings. Strings are, of course, char arrays; that means that there's already a dimension in use.

It would probably be better to just use a struct like this:

struct csvTable
{
    char sku[10];
    char plant[10];
    char qty[10];
};

That will also allow you to set your columns to the right data types (it looks like SKU could be an int, but I don't know the context).

Here's an example of that implementation. I apologize for the mess, it's adapted on the fly from something I was already working on.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Based on your estimate
// You could make this adaptive or dynamic

#define rowNum 500000

struct csvTable
{
    char sku[10];
    char plant[10];
    char qty[10];
};

// Declare table
struct csvTable table[rowNum];

int main()
{
    // Load file
    FILE* fp = fopen("demo.csv", "r");

    if (fp == NULL)
    {
        printf("Couldn't open file\n");
        return 0;
    }

    for (int counter = 0; counter < rowNum; counter++)
    {
        char entry[100];
        fgets(entry, 100, fp);

        char *sku = strtok(entry, ",");
        char *plant = strtok(NULL, ",");
        char *qty = strtok(NULL, ",");

        if (sku != NULL && plant != NULL && qty != NULL)
        {
            strcpy(table[counter].sku, sku);
            strcpy(table[counter].plant, plant);
            strcpy(table[counter].qty, qty);
        }
        else
        {
            strcpy(table[counter].sku, "\0");
            strcpy(table[counter].plant, "\0");
            strcpy(table[counter].qty, "\0");
        }
    }

    // Prove that the process worked
    for (int printCounter = 0; printCounter < rowNum; printCounter++)
    {
        printf("Row %d: column 1 = %s, column 2 = %s, column 3 = %s\n", 
            printCounter + 1, table[printCounter].sku, 
            table[printCounter].plant, table[printCounter].qty);
    }

    // Wait for keypress to exit
    getchar();

}
Scott Forsythe
  • 360
  • 6
  • 18
1

There are multiple problems in your code:

  • In the second loop, you do not stop reading the file after 10 lines, so you would try and store elements beyond the end of the A array.
  • You do not reset j to 0 at the start of the while (j < 10) loop. j happens to have the value 10 at the end of the initialization loop, so you effectively do not store anything into the matrix.
  • The matrix A should be a 2D array of char *, not int, or potentially an array of structures.

Here is a simpler version with an allocated array of structures:

#include <stdio.h>
#include <stdlib.h>

typedef struct item_t {
    char SKU[20];
    char Plant[20];
    char Qty[20];
};

int main(void) {
    FILE *stream = fopen("input/input.csv", "r");
    char line[200];
    int size = 0, len = 0, i, c;
    item_t *A = NULL;

    if (stream) {
        while (fgets(line, sizeof(line), stream)) {
            if (len == size) {
                size = size ? size * 2 : 1000;
                A = realloc(A, sizeof(*A) * size);
                if (A == NULL) {
                    fprintf(stderr, "out of memory for %d items\n", size);
                    return 1;
                }
            }
            if (sscanf(line, "%19[^,\n],%19[^,\n],%19[^,\n]%c",
                       A[len].SKU, A[len].Plant, A[len].Qty, &c) != 4
            ||  c != '\n') {
                fprintf(stderr, "invalid format: %s\n, line);
            } else {
                len++;
            }
        }
        fclose(stream);

        //print matrix
        for (i = 0; i < len; i++) {
            printf("%s,%s,%s\n", A[i].SKU, A[i].Plant, A[i].Qty);
        }
        free(A);
    }
    return 0;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189