Reading from a file word by word

Question

I have a custom archive structured as follows:

%list% name1 name2 name3 %list%

%dirs% archive directories %dirs%

%content% name1 path1 content of file1 %content%
%content% name2 path2 content of file2 %content%
%content% name3 path3 content of file3 %content%

%list% contains names of files in archive
%dirs% contains names of directories
%content% lists file contents.

Since I need to print the content of a specified file, I want to read this archive word by word, in order to identify %content%tag and file name.

I know the existence of fscanf(), but it seems to work efficiently only if you know the archive pattern.

Is there a C library or command, like ifstream for C++, that allows me to read word by word?

Maybe you should show your `fscanf` code and tell what's wrong with it... Now it's not clear what's so "inefficient" about it. — hyde, May 06 '13 at 14:33
The simplest way to do this is to `fgetc` until you see a whitespace character (i.e. `\r` `\n` `\t` or space). Then, parse the word you just read in. Without more information, it is difficult to help further. — RageD, May 06 '13 at 14:34
You can read word-by-word using `fscanf` (although `fgets` might be simpler). However, much as I hate to suggest this, you might want to look into restructuring your archive with XML and using existing XML libraries (expat, libxml, etc.) to access and modify it. That way you don't have to worry about parsing tags, just content. — John Bode, May 06 '13 at 14:37
@JohnBode, telling someone who can't read a word from file to use XML is probably not a good idea. — Shahbaz, May 06 '13 at 15:26
@Shahbaz: which is part of why I hated to suggest it. However, the structure of their archive strongly argues for it, and with a decent XML API (for suitably loose definitions of "decent") they wouldn't have to worry about buffer sizes, or parsing tags, or other low-level issues that aren't necessarily *difficult* but nevertheless a pain in the ass. — John Bode, May 06 '13 at 16:34
this archive is not for work purposes though. I wanted to edit `tar` command, but it was too messy for my C knowledge. So i tried to make a (maybe excessively) simple archive this way. — Andrea Gottardi, May 06 '13 at 23:02

score 17 · Accepted Answer · edited May 23 '17 at 12:32

17

You can just use fscanf to read one word at a time:

void read_words (FILE *f) {
    char x[1024];
    /* assumes no word exceeds length of 1023 */
    while (fscanf(f, " %1023s", x) == 1) {
        puts(x);
    }
}

If you don't know the maximum length of each word, you can use something similar to this answer to get the complete line, then use sscanf instead, using a buffer as large as the one created to read in the complete line. Or, you could use strtok to slice the read in line up into words.

edited May 23 '17 at 12:32

Community

1
1

answered May 06 '13 at 14:34

jxh

69,070
8
110
193

1

Unless he can guarantee his buffer is large enough, `%s` may be dangerous. Maybe `%1024s`, but this would limit functionality, perhaps. – RageD May 06 '13 at 14:35
1

NB: You must specify `"%1023s"` for a buffer of size 1024. The length in the format string does not include the terminal null. (Why? Ancient history — at least 30 years too late to change it now.) – Jonathan Leffler May 06 '13 at 14:46
+1 for updated reference. @JonathanLeffler: That is correct, thanks (bah, off-by-one) – RageD May 06 '13 at 14:59
@JonathanLeffler: It is also an oversight that there is no way to inject a computed bound for the string other than building up a string dynamically at runtime (`*` got used for *ignore* instead). – jxh May 10 '18 at 02:22
@jxh: Yes — see [How to prevent `scanf()` causing a buffer overflow in C?](https://stackoverflow.com/questions/1621394/how-to-prevent-scanf-causing-a-buffer-overflow-in-c) – Jonathan Leffler May 10 '18 at 06:18
@JonathanLeffler Do you know if the `m` modifier is widely implemented? – jxh Jan 19 '22 at 17:18
1

@jxh: It's available on Linux with glibc (GNU C Library). AFAIK, it is not yet available on macOS (and hence, probably not on BSD); certainly, it wasn't available up until Big Sur, but I haven't checked Monterey. I don't suppose it is available on Windows. I'm not sure about AIX, Solaris, HP-UX; I'd guess "yes" for the first two and a tentative "no" for HP-UX, but stand to be proved wrong. – Jonathan Leffler Jan 19 '22 at 17:21

BLUEPIXY · Answer 2 · 2013-05-06T15:34:39.350

like this

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

typedef char Type;

typedef struct vector {
    size_t size;
    size_t capacity;
    Type *array;
} Vector;

Vector *vec_make(){
    Vector *v;
    v = (Vector*)malloc(sizeof(Vector));
    if(v){
        v->size = 0;
        v->capacity=16;
        v->array=(Type*)realloc(NULL, sizeof(Type)*(v->capacity += 16));
    }
    return v;
}

void vec_add(Vector *v, Type value){
    v->array[v->size] = value;
    if(++v->size == v->capacity){
        v->array=(Type*)realloc(v->array, sizeof(Type)*(v->capacity += 16));
        if(!v->array){
            perror("memory not enough");
            exit(-1);
        }
    }
}

void vec_reset(Vector *v){
    v->size = 0;
}

size_t vec_size(Vector *v){
    return v->size;
}

Type *vec_getArray(Vector *v){
    return v->array;
}

void vec_free(Vector *v){
    free(v->array);
    free(v);
}

char *fin(FILE *fp){
    static Vector *v = NULL;
    int ch;

    if(v == NULL) v = vec_make();
    vec_reset(v);
    while(EOF!=(ch=fgetc(fp))){
        if(isspace(ch)) continue;//skip space character
        while(!isspace(ch)){
            vec_add(v, ch);
            if(EOF == (ch = fgetc(fp)))break;
        }
        vec_add(v, '\0');
        break;
    }
    if(vec_size(v) != 0) return vec_getArray(v);
    vec_free(v);
    v = NULL;
    return NULL;
}

int main(void){
    FILE *fp = stdin;
    char *wordp;
    while(NULL!=(wordp=fin(fp))){
        printf("%s\n", wordp);
    }
    return 0;
}

Reading from a file word by word

2 Answers2

Linked