2

In a practice exercise to familiarize myself with pointers, I wrote a short program in C, able to read text from a file. I would like to stick to ANSI C.

The program does its job perfectly, however I want to proceed to read columns from a text file and save to separate arrays. Similar questions have been asked, with replies using strtok, or fgets or sscanf, but when should I use one instead of the other?

Here is my commented code:

#include <stdio.h>
#include <stdlib.h>

char *read_file(char *FILE_INPUT);     /*function to read file*/

int main(int argc, char **argv) {
    char *string; // Pointer to a char 

    string = read_file("file.txt");
    if (string) {
        // Writes the string pointed to by string to the stream pointed to by stdout, and appends a new-line character to the output.
        puts(string);
        // Causes space pointed to by string to be deallocated
        free(string);
    }
    return 0;
}

//Returns a pointer to a char,
char *read_file(char *FILE_INPUT) {
    char *buffer = NULL;
    int string_size, read_size;
    FILE *input_stream = fopen(FILE_INPUT, "r");

    //Check if file exists
    if (input_stream == NULL) {
        perror (FILE_INPUT);
    }
    else if (input_stream) {
        // Seek the last byte of the file. Offset is 0 for a text file.
        fseek(input_stream, 0, SEEK_END);
        // Finds out the position of file pointer in the file with respect to starting of the file
        // We get an idea of string_size since ftell returns the last value of the file pos
        string_size = ftell(input_stream);
        // sets the file position indicator for the stream to the start of the file
        rewind(input_stream);

        // Allocate a string that can hold it all
        // malloc returns a pointer to a char, +1 to hold the NULL character
        // (char*) is the cast return type, this is extra, used for humans
        buffer = (char*)malloc(sizeof(char) * (string_size + 1));

        // Read it all in one operation, returns the number of elements successfully read,
        // Reads into buffer, up to string_size whose size is specified by sizeof(char), from the input_stream !
        read_size = fgets(buffer, sizeof(char), string_size, input_stream);

        // fread doesn't set it so put a \0 in the last position
        // and buffer is now officially a string
        buffer[string_size] = '\0';

        //string_size determined by ftell should be equal to read_size from fread
        if (string_size != read_size) {
            // Something went wrong, throw away the memory and set
            // the buffer to NULL
            free(buffer);
            buffer = NULL;
        }

        // Always remember to close the file.
        fclose(input_stream);
    }

    return buffer;
}

How can I read all columns from a text file of this format, into an array? Number of columns is fixed, but number of rows can vary.

C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
.
B 08902768 1060 800 Test3000
.
.

On further research, I found that fread is used to allow a program to read and write large blocks of data in a single step, so reading columns separately may not be what fread is intended to do. Thus my program implementation for this kind of job is wrong.

Should I use getc, strtok, sscanf or getline to read such a text file? I am trying to stick to good programming principles and allocate memory dynamically.


EDIT:

By correct I am mean (but not limited to) using good c programming techniques and dynamic memory allocation.

My first thought was to replace fread with fgets. Update, I am getting somewhere thanks to your help.

    // Allocate a string that can hold it all
    // malloc returns a pointer to a char, +1 to hold the NULL    character
    // (char*) is the cast return type, this is extra, used for humans
    buffer = (char*)malloc(sizeof(char) * (string_size + 1));

    while (fgets(buffer, sizeof(char) * (string_size + 1), input_stream), input_stream)) {
        printf("%s", buffer);     
    }

for the above text file prints:

C 08902019 1020 50 Test1

A 08902666 1040 30 Test2

B 08902768 1060 80 Test3

B 08902768 1060 800 Test3000

I also managed to remove the newline character from fgets() input using:

strtok(buffer, "\n"); 

Similar examples here , here and here

How can I proceed to save the columns to separate arrays?

rrz0
  • 2,182
  • 5
  • 30
  • 65
  • 2
    "so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device. – Dai Nov 10 '18 at 08:31
  • 1
    No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can use `fseek` to jump to particular location in the file to read some bytes. – Rishikesh Raje Nov 10 '18 at 08:36
  • Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question. – rrz0 Nov 10 '18 at 08:38
  • @DavidC.Rankin understood, so the recommended way is to use `fgets`? – rrz0 Nov 10 '18 at 08:45
  • If the file is binary, then you are pretty much stuck with the `struct` approach. If it is just text, then yes, `fgets` then `sscanf` (or walk a pair of pointers down the line picking out what you need) Note: you can also use `fgets` then `strtok` to separate (tokenize) the fields. You can do the same thing with `sscanf` using the `"%n"` specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion. – David C. Rankin Nov 10 '18 at 08:46
  • Yes, this is a text file, updated question for clarity. Thanks for the suggestions. – rrz0 Nov 10 '18 at 08:48
  • If you know the max number of fields (say 5 as shown) you can use `fgets` and then `int numfields = sscanf (buf, "%s %s %s %s %s", col1, col2, ...);` and verify you have the column you are looking for based on the *return*. Not as elegant as handling an unknown number of fields, but if it fits your data..... (also, don't forget to protect your array bounds if you use `"%s"` by including a *field-width* modifier of 1 less than the array size, e.g. `"%31s"` for a `32` char array) – David C. Rankin Nov 10 '18 at 08:52
  • `Cannot read from Input File` is not a useful error message. Let the system tell you what the problem is: `if (input_stream == NULL) { perror (FILE_INPUT);}` – William Pursell Nov 10 '18 at 08:59
  • Thanks @WilliamPursell, noted and will update code. – rrz0 Nov 10 '18 at 09:00
  • @Rrz0 - are the sizes of each field fixed? (e.g. the number of characters in each field?, `col1` has `1-char`, `col2` has `8-char`, etc..) – David C. Rankin Nov 10 '18 at 09:45
  • Ideally none would be fixed. In my case column 1 and column 2 are fixed but 3 and 4 may vary. – rrz0 Nov 10 '18 at 09:47
  • This is opinion based. There is no "correct" way. If you want criticism on your code, https://codereview.stackexchange.com/ is abetter fit. Good luck! – kfx Nov 10 '18 at 10:56
  • Hmm, thanks for your comment @kfx, maybe I should remove the word 'correct from the question. I am simply looking for an efficient way to do what I asked. – rrz0 Nov 10 '18 at 10:57

4 Answers4

4

"Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.

For reading a fixed number of fields (in your case choosing cols 1, 2, 5 as string values of unknown length) and cols 3, 4 as simple int values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.

An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2 or 2 or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.

Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf and validating that 5 conversions took place by checking the sscanf return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.

(there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)

As discussed in the comments, reading a line of data with a line-oriented input function like fgets or POSIX getline and then parsing the data by either tokenizing with strtok, or in this case with a fixed number of fields simply parsing with sscanf is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf, but that also injects the scanf problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)

The easiest way to approach storage of your 5-fields is to declare a simple struct. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int, e.g.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ARRSZ   2   /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024

typedef struct {
    char *col1, *col2, *col5;
    int col3, col4;
} mydata_t;

Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8 or 16 with a doubling scheme as that will grow reasonably fast), but we have chosen 2 here with #define ARRSZ 2 to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024 for your data (don't skimp on buffer size)

To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size) you realloc, e.g.

int main (int argc, char **argv) {

    char buf[MAXC];
    size_t arrsz = ARRSZ, line = 0, row = 0;
    mydata_t *data = NULL;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    /* allocate an 'arrsz' initial number of struct */
    if (!(data = malloc (arrsz * sizeof *data))) {
        perror ("malloc-data");
        return 1;
    }

    while (fgets (buf, MAXC, fp)) {         /* read each line from file */
        char c1[MAXC], c2[MAXC], c5[MAXC];  /* temp strings for c1,2,5 */
        int c3, c4;                         /* temp ints for c3,4 */
        size_t len = strlen (buf);          /* length for validation */

        line++;     /* increment line count */

        /* validate line fit in buffer */
        if (len && buf[len-1] != '\n' && len == MAXC - 1) {
            fprintf (stderr, "error: line %zu exceeds MAXC chars.\n", line);
            return 1;
        }

        if (row == arrsz) { /* check if all pointers used */
            void *tmp = realloc (data, arrsz * 2 * sizeof *data);
            if (!tmp) {     /* validate realloc succeeded */
                perror ("realloc-data");
                break;      /* break, don't exit, data still valid */
            }
            data = tmp;     /* assign realloc'ed block to data */
            arrsz *= 2;     /* update arrsz to reflect new allocation */
        }

(note: when calling realloc, you never realloc the pointer itself, e.g. data = realloc (data, new_size); If realloc fails (and it does), it returns NULL which would overwrite your original pointer causing a memory leak. Always realloc with a temporary pointer, validate, then assign the new block of memory to your original pointer)

What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.

        /* parse buf into fields, handle error on invalid format of line */
        if (sscanf (buf, "%1023s %1023s %d %d %1023s", 
                    c1, c2, &c3, &c4, c5) != 5) {
            fprintf (stderr, "error: invalid format line %zu\n", line);
            continue;   /* get next line */
        }

        /* allocate copy strings, assign allocated blocks to pointers */
        if (!(data[row].col1 = mystrdup (c1))) { /* validate copy of c1 */
            fprintf (stderr, "error: malloc-c1 line %zu\n", line);
            break;      /* same reason to break not exit */
        }
        if (!(data[row].col2 = mystrdup (c2))) { /* validate copy of c2 */
            fprintf (stderr, "error: malloc-c1 line %zu\n", line);
            break;      /* same reason to break not exit */
        }
        data[row].col3 = c3;    /* assign integer values */
        data[row].col4 = c4;
        if (!(data[row].col5 = mystrdup (c5))) { /* validate copy of c5 */
            fprintf (stderr, "error: malloc-c1 line %zu\n", line);
            break;      /* same reason to break not exit */
        }
        row++;      /* increment number of row pointers used */
    }
    if (fp != stdin)    /* close file if not stdin */
        fclose (fp);

    puts ("values stored in struct\n");
    for (size_t i = 0; i < row; i++)
        printf ("%-4s %-10s %4d %4d %s\n", data[i].col1, data[i].col2, 
                data[i].col3, data[i].col4, data[i].col5);

    freemydata (data, row);

    return 0;
}

And we are done (except for the memory use/error check)

Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup() You can use strdup() if you have it, I simply included the function to show you how to manually handle the malloc and copy. Note: how the copy is done with memcpy instead of strcpy -- Why? You already scanned forward in the string to find '\0' when you found the length with strlen -- no need to have strcpy repeat that process again -- just use memcpy.

/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
{
    if (!s)     /* validate s not NULL */
        return NULL;

    size_t len = strlen (s);            /* get length */
    char *sdup = malloc (len + 1);      /* allocate length + 1 */

    if (!sdup)          /* validate */
        return NULL;

    return memcpy (sdup, s, len + 1);   /* pointer to copied string */ 
}

Last helper function is freemydata() which just calls free() on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc block to move that to it's own function as well)

/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
{
    for (size_t i = 0; i < n; i++) {    /* free allocated strings */
        free (data[i].col1);
        free (data[i].col2);
        free (data[i].col5);
    }
    free (data);    /* free structs */
}

Putting all the pieces together would give you:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ARRSZ   2   /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024

typedef struct {
    char *col1, *col2, *col5;
    int col3, col4;
} mydata_t;

/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
{
    if (!s)     /* validate s not NULL */
        return NULL;

    size_t len = strlen (s);            /* get length */
    char *sdup = malloc (len + 1);      /* allocate length + 1 */

    if (!sdup)          /* validate */
        return NULL;

    return memcpy (sdup, s, len + 1);   /* pointer to copied string */ 
}

/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
{
    for (size_t i = 0; i < n; i++) {    /* free allocated strings */
        free (data[i].col1);
        free (data[i].col2);
        free (data[i].col5);
    }
    free (data);    /* free structs */
}

int main (int argc, char **argv) {

    char buf[MAXC];
    size_t arrsz = ARRSZ, line = 0, row = 0;
    mydata_t *data = NULL;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    /* allocate an 'arrsz' initial number of struct */
    if (!(data = malloc (arrsz * sizeof *data))) {
        perror ("malloc-data");
        return 1;
    }

    while (fgets (buf, MAXC, fp)) {         /* read each line from file */
        char c1[MAXC], c2[MAXC], c5[MAXC];  /* temp strings for c1,2,5 */
        int c3, c4;                         /* temp ints for c3,4 */
        size_t len = strlen (buf);          /* length for validation */

        line++;     /* increment line count */

        /* validate line fit in buffer */
        if (len && buf[len-1] != '\n' && len == MAXC - 1) {
            fprintf (stderr, "error: line %zu exceeds MAXC chars.\n", line);
            return 1;
        }

        if (row == arrsz) { /* check if all pointers used */
            void *tmp = realloc (data, arrsz * 2 * sizeof *data);
            if (!tmp) {     /* validate realloc succeeded */
                perror ("realloc-data");
                break;      /* break, don't exit, data still valid */
            }
            data = tmp;     /* assign realloc'ed block to data */
            arrsz *= 2;     /* update arrsz to reflect new allocation */
        }

        /* parse buf into fields, handle error on invalid format of line */
        if (sscanf (buf, "%1023s %1023s %d %d %1023s", 
                    c1, c2, &c3, &c4, c5) != 5) {
            fprintf (stderr, "error: invalid format line %zu\n", line);
            continue;   /* get next line */
        }

        /* allocate copy strings, assign allocated blocks to pointers */
        if (!(data[row].col1 = mystrdup (c1))) { /* validate copy of c1 */
            fprintf (stderr, "error: malloc-c1 line %zu\n", line);
            break;      /* same reason to break not exit */
        }
        if (!(data[row].col2 = mystrdup (c2))) { /* validate copy of c2 */
            fprintf (stderr, "error: malloc-c1 line %zu\n", line);
            break;      /* same reason to break not exit */
        }
        data[row].col3 = c3;    /* assign integer values */
        data[row].col4 = c4;
        if (!(data[row].col5 = mystrdup (c5))) { /* validate copy of c5 */
            fprintf (stderr, "error: malloc-c1 line %zu\n", line);
            break;      /* same reason to break not exit */
        }
        row++;      /* increment number of row pointers used */
    }
    if (fp != stdin)    /* close file if not stdin */
        fclose (fp);

    puts ("values stored in struct\n");
    for (size_t i = 0; i < row; i++)
        printf ("%-4s %-10s %4d %4d %s\n", data[i].col1, data[i].col2, 
                data[i].col3, data[i].col4, data[i].col5);

    freemydata (data, row);

    return 0;
}

Now test.

Example Input File

$ cat dat/fivefields.txt
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3

Example Use/Output

$ ./bin/fgets_fields <dat/fivefields.txt
values stored in struct

C    08902019   1020   50 Test1
A    08902666   1040   30 Test2
B    08902768   1060   80 Test3

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/fgets_fields <dat/fivefields.txt
==1721== Memcheck, a memory error detector
==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==1721== Command: ./bin/fgets_fields
==1721==
values stored in struct

C    08902019   1020   50 Test1
A    08902666   1040   30 Test2
B    08902768   1060   80 Test3
==1721==
==1721== HEAP SUMMARY:
==1721==     in use at exit: 0 bytes in 0 blocks
==1721==   total heap usage: 11 allocs, 11 frees, 243 bytes allocated
==1721==
==1721== All heap blocks were freed -- no leaks are possible
==1721==
==1721== For counts of detected and suppressed errors, rerun with: -v
==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

Look things over and let me know if you have further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Are you sure, I `line++;` before I hit that block `:)` (didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before the `line++;` and just added it above by happy-mistake. – David C. Rankin Nov 10 '18 at 11:17
  • Hardcoding `1023` in the `sscanf()` calls defeats the purpose of using a define for `MAXC`. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the `1023` and use `%s` directly. – chqrlie Nov 10 '18 at 11:20
  • Yes, but that too was just by chance -- I harp on always using the *field-width* modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the *field-width* modifier redundant and since sized at `MAXC` the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks `:)` – David C. Rankin Nov 10 '18 at 11:23
  • I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here. – chqrlie Nov 10 '18 at 11:25
  • First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time with `struct`, and I do not understand why you pass `col1`, `col2` and `col5` as pointers to chars, while `col3` and `col4` as integers. As my sample text file, `col4` and `col5` may have varying sizes, while the others are fixed. Thanks once again. – rrz0 Nov 11 '18 at 09:17
  • I just looked at the data and said let's let `col 1, 2, 5` have any number of characters - so they are character pointers which are individually allocated and whatever is in `1, 2, 5` is stored. `col 3, 4` looked like numbers, so I made them `int` and they can hold any number from `-2147483648 to 2147483647` (so if your `col 3, 4` can be bigger than that -- you can change the type). You could make them all character pointers and allocate for all of them, but, if they are already integer, it's a whole lot easier just to store them that way. No tricks, just looking at what was there. – David C. Rankin Nov 11 '18 at 10:36
1

I want to proceed to read only certain columns of this text file.

You can do this with any input function: getc, fgets, sscanf, getline... but you must first define exactly what you mean by certain columns.

  • columns can be defined as separated by a specific character such as ,, ; or TAB, in which case strtok() is definitely not the right choice because it treats all sequences of separating characters as a single separator: hence a,,b would be seen as having only 2 columns.
  • if they are instead separated by whitespace, any sequence of spaces or tabs, strtok, strpbrk or strspn / strcspn might come in handy.

In any case, you can read the file line by line with fgets but you might have a problem with very long lines. getline is a solution, but it might not be available on all systems.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as `,` or else simply white space, depending on which is 'easier' to implement. – rrz0 Nov 10 '18 at 11:00
  • The answer [here](https://stackoverflow.com/questions/12499219/using-fgets-and-strtok-to-read-in-a-file-line-by-line-in-c) uses `strtok()` for both `,` and whitespace separated columns. Why is `strtok()` not a good choice for the first case you mentioned? – rrz0 Nov 10 '18 at 11:07
  • @Rrz0: I amended the answer to explain why `strtok` is inappropriate for `,`. `strtok` has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks. – chqrlie Nov 10 '18 at 11:13
0

If you know what is column separator and how many columns you have you use getline with column separator and then with line separator.

Here is getline:

http://man7.org/linux/man-pages/man3/getline.3.html

It is very good because it allocates space for you, no need to know how many bytes is your column or line.

Or you just use getline as in code example in link to read whole line then you "parse" and extract columns as you wish....

If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(

Or is it somehow you cannot use library?

Although while waiting for better question I will note that you can use awk to read columns from text file but probably this is not what you want? Because what are you trying to do really?

  • @DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths **before** content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-) –  Nov 10 '18 at 08:49
  • Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use. – rrz0 Nov 10 '18 at 08:50
  • @DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry. –  Nov 10 '18 at 08:51
  • @Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good? –  Nov 10 '18 at 08:52
  • @MunDong - sorry, you are correct about the text file, I was thrown off by the suggestion of `fread`... – David C. Rankin Nov 10 '18 at 08:54
  • @MunDong, I want to read text file columns into arrays. Columns are of known length, but there a varying number of rows. – rrz0 Nov 10 '18 at 08:57
  • @Rrz0 but do you know which columns? or all columns? and what do you mean array? array of strings of different length for each line or array of something else? Do you want to save length of array somewhere? So many questions.... –  Nov 10 '18 at 09:02
  • @Rrz0 and please correct question if it is not correct. You say: "Number of columns is fixed - Number of rows can vary" but then you show example where number of **columns** vary?? I am confuse. –  Nov 10 '18 at 09:03
  • @MunDong Number of columns separated by a space do not vary. I had edited the question. Also I did not specify exactly since I thought this was irrelevant to the question. I was looking for a general structure of how can one read columns into a text file using good C programming principles. However: all columns, array of strings of varying length – rrz0 Nov 10 '18 at 09:04
  • @Rrz0 how about column separator? Do you know column separator in advance or not? and do you have CSV-style embedding of separator inside column or do you escape separator somehow or just separator not allowed inside column?..... so many questions. Y U no use `awk`? –  Nov 10 '18 at 09:07
  • @Rrz0 "I was looking for a general structure of how can one read columns into a text file using good C programming principles." I am probably wrong but you really need to define question better. Good principles come from defined problems. –  Nov 10 '18 at 09:25
  • I edited the question to answer some of your concerns. Please take a look and tell me what I can improve. I will do my best. – rrz0 Nov 10 '18 at 09:26
  • @Rrz0 you should probably read source code of `awk` if you are serious about learning good C programming principles. –  Nov 10 '18 at 09:28
  • @Rrz0 very good edit to question! But still can you confirm what is column separator and if you can have column separator inside column and if you do how you escape or quote it??? You should maybe look at different column formats (try first CSV, https://tools.ietf.org/html/rfc4180 ) –  Nov 10 '18 at 09:35
  • Best to do it without a column separator, but I can use ' , ' for example. – rrz0 Nov 10 '18 at 10:44
  • @Rrz0 You can do many things about columns: fixed width columns no separator OR column separator but escape inside column and double escape for escape char inside column OR column separator but quote if separator inside column. Must choose something that is defined, if it is not defined how do you know how to program?? Did you see RFC4180 link I put in comment? You can also look for `cut` program in Linux and of course `awk` for ideas about what you really want. –  Nov 10 '18 at 10:59
  • @Rrz0 last comment: C is great language but you must really understand what is the problem you want to solve with C because C is not just Python "ah well for lines in file split no problem" you must know what you split but more important you must understand **why** because this defines your edge conditions then you can turn impossible problem into smaller problem that you can solve no problem. –  Nov 10 '18 at 11:04
0

Depending on the data and daring, you could use scanf or a parser created with yacc/lex.