0

I have two files .csv and I need to read the whole file but it have to be filed by field. I mean, csv files are files with data separated by comma, so I cant use fgets.
I need to read all the data but I don't know how to jump to the next line.

Here is what I've done so far:

int main()
{
   FILE *arq_file;
   arq_file = fopen("file.csv", "r");

   if(arq_file == NULL){
      printf("Not possible to read the file.");
      exit(0);
   }

   while( !feof(arq_file) ){
   fscanf(arq_file, "%i %lf", &myStruct[i+1].Field1, &myStruct[i+1].Field2);  
   }

   fclose(arq_file);
   return 0;
}  

It will get in a infinity loop because it never gets the next line.
How could I reach the line below the one I just read?

Update: File 01 Example

1,Alan,123,
2,Alan Harper,321
3,Jose Rendeks,32132
4,Maria da graça,822282
5,Charlie Harper,9999999999  

File 02 Example

1,320,123
2,444,321
3,250,123,321
3,3,250,373,451
2,126,621
1,120,320
2,453,1230
3,12345,0432,1830
Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
PlayHardGoPro
  • 2,791
  • 10
  • 51
  • 90
  • 3
    [`while (!feof(file))` is always wrong](http://stackoverflow.com/a/26557243/1983495). Can you post sample input/output? And by the way, you should copy and paste the code, because in the posted code `i` isn't declared and thus it does not constitute a [MCVE](http://stackoverflow.com/help/mcve). – Iharob Al Asimi Mar 03 '15 at 22:28
  • How many data elements are in each line? Are the elements separated by commas (as the name CSV suggests) or by some other separator? Do you have to deal with double-quoted fields that might themselves contain commas? Can you show perhaps 5 lines of your data? – Jonathan Leffler Mar 03 '15 at 22:37
  • 1
    Incidentally, at minimum you're going to need `"%i , %lf"` as the format string; if the double value is also followed by a comma, you need another blank and comma after the `%lf`. The numeric inputs will skip white space anyway, which includes newlines. (The space after the comma is optional; the space before is not really optional, though if you're confident that there are never blanks after the number and before the comma, it becomes optional.) You might do better to get a CSV-reading library. [The Practice of Programming](http://cm.bell-labs.com/cm/cs/tpop/) has code for the job. – Jonathan Leffler Mar 03 '15 at 22:39
  • @JonathanLeffler I just updated the files example. Always separated by comma only. No double-quoted fields. – PlayHardGoPro Mar 03 '15 at 22:45
  • @PlayHardGoPro this looks like a taks to be done with `fgets()` + parsing the line. – Iharob Al Asimi Mar 03 '15 at 22:49
  • @iharob But then how would I separted the data with the comma? I need to put it's values into a struct after read it. – PlayHardGoPro Mar 03 '15 at 22:50

2 Answers2

2

I think an example is better than giving you hints, this is a combination of fgets() + strtok(), there are other functions that could work for example strchr(), though it's easier this way and since I just wanted to point you in the right direction, well I did it like this

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int
main(void)
{
    FILE  *file;
    char   buffer[256];
    char  *pointer;
    size_t line;

    file = fopen("data.dat", "r");
    if (file == NULL)
     {
        perror("fopen()");
        return -1;
     }

    line = 0;
    while ((pointer = fgets(buffer, sizeof(buffer), file)) != NULL)
     {
        size_t field;
        char  *token;

        field = 0;
        while ((token = strtok(pointer, ",")) != NULL)
         {
            printf("line %zu, field %zu -> %s\n", line, field, token);

            field  += 1;
            pointer = NULL;
         }
        line += 1;
     }
    return 0;
}

I think it's very clear how the code works and I hope you can understand.

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • Could you teach me, the reason why it starts to print in a infinity loop "11111" if I remove the *pointer = NULL* ? I really can't understand one thing. How am I getting real data from "pointer". I thought that, when working with pointers. pointer = memory address, *pointer = content storad in that memory address. Am I wrong? – PlayHardGoPro Mar 04 '15 at 02:29
  • 1
    @PlayHardGoPro because `strtok()` should receive `NULL` after the first call with the same string. Second question, I don't understand what you mean? getting data where? the `printf()` function does `*pointer` and acesses the data the pointer points to. – Iharob Al Asimi Mar 04 '15 at 02:43
  • 1
    @PlayHardGoPro internally `strtok()` uses a static variable to store the state of the tokenization, so you need to tell it that you are parsing the same string, that is done passing `NULL` as the first parameter, note that if you pass another string then it will not work, there is a reentrant POSIX version of `strtok()` `strtok_r()` which stores the state in a user provided poitner, but still, `strtok()` is ok only in simple situations like this. – Iharob Al Asimi Mar 04 '15 at 02:47
  • Got ya, thanks ! I already have the data without the comma, when I print token I get "1jose123". Just don't know how to split it to assign it's values to my struct. Here is the full code. I'm having some problem to pass the value of the token to my struct. If you could check this out. http://pastebin.com/TQeB4G1G – PlayHardGoPro Mar 04 '15 at 02:54
2

If the same code has to handle both data files, then you're stuck with reading the fields into a string, and subsequently converting the string into a number.

It is not clear from your description whether you need to do something special at the end of line or not — but because only one of the data lines ends with a comma, you do have to allow for fields to be separated by a comma or a newline.

Frankly, you'd probably do OK with using getchar() or equivalent; it is simple.

char buffer[4096];
char *bufend = buffer + sizeof(buffer) - 1;
char *curfld = buffer;
int c;

while ((c = getc(arq_file)) != EOF)
{
    if (curfld == bufend)
        …process overlong field…
    else if (c == ',' || c == '\n')
    {
        *curfld = '\0';
        process(buffer);
        curfld = buffer;
    }
    else
        *curfld++ = c;
}
if (c == EOF && curfld != buffer)
{
    *curfld = '\0';
    process(buffer);
}

However, if you want to go with higher level functions, then you do want to use fgets() to read lines (unless you need to worry about deviant line endings, such as DOS vs Unix vs old-style Mac (CR-only) line endings). Or use POSIX getline() to read arbitrarily long lines. Then split the lines using strtok_r() or equivalent.

char *buffer = 0;
size_t buflen = 0;

while (getline(&buffer, &buflen, arq_file) != -1)
{
     char *posn = buffer;
     char *epos;
     char *token;
     while ((token = strtok_r(posn, ",\n", &epos)) != 0)
     {
         process(token);
         posn = 0;
     }
     /* Do anything special for end of line */
}
free(buffer);

If you think you must use scanf(), then you need to use something like:

char buffer[4096];
char c;

while (fscanf(arq_file, "%4095[^,\n]%c", buffer, &c) == 2)
    process(buffer);

The %4095[^,\n] scan set reads up to 4095 characters that are neither comma nor newline into buffer, and then reads the next character (which must, therefore, either be comma or newline — or conceivably EOF, but that causes problems) into c. If the last character in the file is neither comma nor newline, then you will skip the last field.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Concerning last code and `EOF`, how about `fscanf(arq_file, "%4095[^,\n]%c", buffer, &c) >= 1`? Yet the larger issue with this code is that `%4095[^,\n]` scan set reads 1 to 4095 characters but not 0. +1 for the other approaches. – chux - Reinstate Monica Mar 04 '15 at 00:26
  • @chux: yes, you can play with `fscanf(…) >= 1`, but I'd probably go with one of the other approaches anyway. I like using 4096 as the buffer size in part for its 'shock value' (as against 80 or 256, say). But there isn't a good way to get `fscanf()` to read zero-length fields; yet another reason to go other ways. `strtok_r()` doesn't handle empty fields properly either (multiple adjacent delimiters are ignored). Ultimately, a CSV library is the way to go. – Jonathan Leffler Mar 04 '15 at 00:30