4

I have a hard time understanding how you process ascii files in c. I have no problem opening files and closing them or reading files with one value on each line. However, when the data is separated with characters, I really don't understand what the code is doing at a lower level.

Example: I have a file containing names separated with comas that looks like this: "MARY","PATRICIA","LINDA","BARBARA","ELIZABETH","JENNIFER"

I have created an array to store them: char names[6000][20]; And now, my code to process it is while (fscanf(data, "\"%s\",", names[index]) != EOF) { index++; } The code executes for the 1st iteration and names[0] contains the whole file.

How can I separate all the names?

Here is the full code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    char names[6000][20]; // an array to store 6k names of max length 19
    FILE * data = fopen("./022names.txt", "r");
    int index = 0;
    int nbNames;

    while (fscanf(data, "\"%s\",", names[index]) != EOF) {
        index++;
    }

    nbNames = index;

    fclose(data);

    printf("%d\n", index);
    for (index=0; index<nbNames; index++) {
        printf("%s \n", names[index]);
    }
    printf("\n");

    return 0;
}

PS: I am thinking this might also be because of the data structure of my array.

gemt
  • 43
  • 5
  • 2
    scanf is not a parser. Read a line at a time and use string functions or regex to parse the line. – stark Mar 05 '21 at 21:06
  • 3
    `%s` is greedy. It will match as many characters as it can. It will not stop at `,` as you intend it to. `scanf` is not suitable for what you are trying to do. One common way: read a line with `fgets` and then use `strtok` to break it up into words. – kaylum Mar 05 '21 at 21:09
  • One way to improve efficiency if the records in the file are unknown at runtime is to read the lines in the file before actually placing the contents of each line into a buffer. That way, you can replace the huge `char names[6000][20];` with allocated memory fitting only what you need. (I realize efficiency may not be the best word to describe the advantage here. :) ) – ryyker Mar 05 '21 at 21:18
  • 3
    "I have a hard time understanding how you process ascii files in c." That is most likely because you are using `scanf`. – William Pursell Mar 05 '21 at 21:26
  • Note that because text stored in a file is readable directly as text, and parseable using delimiters such as commas, `,`. so the double quotes surrounding the name are not necessary. They don't hurt either except for file size. Either way, easy to parse. – ryyker Mar 05 '21 at 21:27
  • gemt, when data is 1000s of names, still all on 1 line like your example? – chux - Reinstate Monica Mar 05 '21 at 21:58

3 Answers3

2

If you want a simple solution, you can read the file character by character using fgetc. Since there are no newlines in the file, just ignore quotation marks and move to the next index when you find a comma.

char names[6000][20]; // an array to store 6k names of max length 19
FILE * data = fopen("./022names.txt", "r");
int name_count = 0, current_name_ind = 0;
int c;

while ((c = fgetc(data)) != EOF) {
    if (c == ',') {
        names[name_count][current_name_ind] = '\0';
        current_name_ind = 0;
        ++name_count;
    } else if (c != '"') {
        names[name_count][current_name_ind] = c;
        ++current_name_ind;
    }
}
names[name_count][current_name_ind] = '\0';

fclose(data);
jackl
  • 127
  • 8
  • 2
    `char c` with `c = fgetc()`? That's wrong - `fgetc()` returns `int` for a reason. If you cram the value returned from `fgetc()` into a `char` you can no longer reliably detect `EOF`. – Andrew Henle Mar 05 '21 at 21:29
  • That's right, I edited it. There are more checks that can be added but for the purposes of a simple example I think it's OK now? – jackl Mar 05 '21 at 21:42
  • 1
    Code is not null character terminating the last name read, – chux - Reinstate Monica Mar 05 '21 at 21:56
2

"The code executes for the 1st iteration and names[0] contains the whole file...., How can I separate all the names?"

Regarding the first few statements:

char names[6000][20]; // an array to store 6k names of max length 19
FILE * data = fopen("./022names.txt", "r");

What if there are there are 6001 names. Or one of the names has more than 20 characters? Or what if there are way less than 6000 names?

The point is that with some effort to enumerate the tasks you have listed, and some time mapping out what information is needed to create the code that matches your criteria, you can create a better product: The following is derived from your post:

  • Process ascii files in c
  • Read file content that is separated by characters
  • input is a comma separated file, with other delimiters as well
  • Choose a method best suited to parse a file of variable size

As mentioned in the comments under your question there are ways to create your algorithms in such way as to flexibly allow for extra long names, or for a variable number of names. This can be done using a few C standard functions commonly used in parsing files. ( Although fscanf() has it place, it is not the best option for parsing file contents into array elements.)

The following approach performs the following steps to accomplish the user needs enumerated above

Following is a complete example of how to implement each of these, while breaking the tasks into functions when appropriate...

Note, code below was tested using the following input file:

names.txt

"MARY","PATRICIA","LINDA","BARBARA","ELIZABETH","JENNIFER",
"Joseph","Bart","Daniel","Stephan","Karen","Beth","Marcia",
"Calmazzothoulumus"

.

//Prototypes
int    count_names(const char *filename, size_t *count);
size_t filesize(const char *fn);
void   populateNames(const char *fn, int longest, char arr[][longest]);

char *filename = ".\\names.txt";

int main(void) 
{
    size_t count = 0;
    int longest = count_names(filename, &count);
    char names[count][longest+1];//VLA - See linked info
                                 // +1 is room for null termination
    memset(names, 0, sizeof names);
    populateNames(filename, longest+1, names);
            
    return 0;
}

//populate VLA with names in file
void populateNames(const char *fn, int longest, char names[][longest])
{
    char line[80] = {0};
    char *delim = "\",\n ";
    char *tok = NULL;
    FILE * fp = fopen(fn, "r");
    if(fp)
    {
        int i=0;
        while(fgets(line, sizeof line, fp))
        {
            tok = strtok(line, delim);
            while(tok)
            {
                strcpy(names[i], tok);
                tok = strtok(NULL, delim);
                i++;
            }
        }
        fclose(fp);
    }
}
    
//passes back count of tokens in file, and return longest token
int count_names(const char *filename, size_t *count)
{
    int len=0, lenKeep = 0;
    FILE *fp = fopen(filename, "r");
    if(fp)
    {
        char *tok = NULL;
        char *delim = "\",\n ";
        int cnt = 0;
        size_t fSize = filesize(filename);
        char *buf = calloc(fSize, 1);
        while(fgets(buf, fSize, fp)) //goes to newline for each get
        {
            tok = strtok(buf, delim);
            while(tok)
            {
                cnt++;
                len = strlen(tok);
                if(lenKeep < len) lenKeep = len;
                tok = strtok(NULL, delim);
            }
        }
        *count = cnt;
        fclose(fp);
        free(buf);
    }
    
    return lenKeep;
}

//return file size in bytes (binary read)
size_t filesize(const char *fn)
{
    size_t size = 0;
    FILE*fp = fopen(fn, "rb");
    if(fp)
    {
        fseek(fp, 0, SEEK_END); 
        size = ftell(fp); 
        fseek(fp, 0, SEEK_SET); 
        fclose(fp);
    }
    return size;
}
dreamcrash
  • 47,137
  • 25
  • 94
  • 117
ryyker
  • 22,849
  • 3
  • 43
  • 87
0

You can use the in-built strtok() function which is easy to use.

I have used the tok+1 instead of tok to omit the first " and strlen(tok) - 2 to omit the last ".

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    char names[6000][20]; // an array to store 6k names of max length 19
    FILE * data = fopen("./022names.txt", "r");
    int index = 0;
    int nbNames;
    char *str = (char*)malloc(120000*sizeof(char));
    while (fscanf(data, "%s", str) != EOF) {
        char *tok = strtok(str, ",");
        while(tok != 0){
            strncpy(names[index++], tok+1, strlen(tok)-2);
            tok = strtok(0, ",");
        }
    }

    nbNames = index;

    fclose(data);
    free(str); // just to free the memory occupied by the str variable in the heap.

    printf("%d\n", index);
    for (index=0; index<nbNames; index++) {
        printf("%s \n", names[index]);
    }
    printf("\n");

    return 0;
}

Also, the parameter 120000 is just the maximum number of characters that can be in the file. It is just 6000 * 20 as you mentioned.

risingStark
  • 1,153
  • 10
  • 17