0

I have to get node ids from DIMES ASNodes.csv (http://netdimes.org/new/?q=node/65) files. File looks like this:

6067,UNKNOWN,2007-02-03 10:03:53.0,2007-01-02 02:54:13.0,12,6,0
29287,UNKNOWN,2007-02-03 21:11:07.0,2007-01-02 07:33:35.0,1,0,0
...

So far I came up with this code, but it doesn't work quite right. Althought it prints out all the numbers I needed, it also prints out the node id twice and sometimes prints zeroes in between. Thanks for any ideas

void loadNodes(const char* filename)
{
    FILE* nodes = fopen(filename, "r");

    unsigned int id = 0;
    char line[64];

    while (fgets(line, sizeof(line), nodes) != NULL) {
        sscanf(line, "%u%*[^\n]", &id);
        printf("id = %u\n", id);
    }
    fclose(nodes);
}

output

Zdnk.k
  • 23
  • 8
  • 3
    You are supposed to provide all information **in the question itself**. Not just as links. And post text **as text**. – too honest for this site May 08 '16 at 19:18
  • 1
    The `csv` doesn't just have numbers. If the csv has pre-defined format, you probably should read the entire line using `sscanf` and use format specifiers to capture each column individually. – Mukul Gupta May 08 '16 at 19:20

2 Answers2

1

I think the trouble is that your lines have 63 characters plus a newline, which means that the fgets() reads up to, but not including, the newline (and you process that and get the correct number), then the next fgets() reads the newline that was left behind on the previous input (and you process that — it is surprising that you get zeros rather than a repeat of the previous number).

Here's your code converted into an MCVE (How to create a Minimal, Complete, and Verifiable Example?) main() program that reads from standard input (which saves me from having to validate, open and close files):

#include <stdio.h>

int main(void)
{
    unsigned id = 0;
    char line[64];

    while (fgets(line, sizeof(line), stdin) != NULL)
    {
        printf("Line: [%s]\n", line);
        sscanf(line,"%u", &id);
        printf("id = %u\n", id);
    }

    return 0;
}

Note the diagnostic printing of the line just read. The code should really check the return value from sscanf(). (There was no virtue in skipping the trailing debris, so I removed that from the format string.)

Given the data file (data):

6067,UNKNOWN,2007-02-03 10:03:53.0,2007-01-02 02:54:13.0,12,6,0
29287,UNKNOWN,2007-02-03 21:11:07.0,2007-01-02 07:33:35.0,1,0,0

The output I get from so.37103830 < data is:

Line: [6067,UNKNOWN,2007-02-03 10:03:53.0,2007-01-02 02:54:13.0,12,6,0]
id = 6067
Line: [
]
id = 6067
Line: [29287,UNKNOWN,2007-02-03 21:11:07.0,2007-01-02 07:33:35.0,1,0,0]
id = 29287
Line: [
]
id = 29287

Avoiding the problem

The simplest fix is to use a longer buffer length; I normally use 4096 when I don't care about what happens if a really long line is read, but you might decide that 128 or 256 is sufficient.

Otherwise, I use POSIX getline() which will read arbitrarily long lines (subject to not running out of memory).

With a longer line length, I get the output:

Line: [6067,UNKNOWN,2007-02-03 10:03:53.0,2007-01-02 02:54:13.0,12,6,0
]
id = 6067
Line: [29287,UNKNOWN,2007-02-03 21:11:07.0,2007-01-02 07:33:35.0,1,0,0
]
id = 29287
Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
0

Assuming you only need the first column from the file (since you mention node ids), you could use:

unsigned int node_id;
char str[100];
while(scanf("%u,%[^\n]",&node_id, str) == 2) {
    printf("%u\n",node_id);
}

Demo

Mukul Gupta
  • 2,310
  • 3
  • 24
  • 39