15

I'm used to PHP, but I'm starting to learn C. I'm trying to create a program that reads a file line by line and stores each line to an array.

So far I have a program that reads the file line by line, and even prints each line as it goes, but now I just need to add each line to an array.

My buddy last night was telling me a bit about it. He said I'd have to use a multidimensional array in C, so basically array[x][y]. The [y] part itself is easy, because I know the maximum amount of bytes that each line will be. However, I don't know how many lines the file will be.

I figure I can make it loop through the file and just increment an integer each time and use that, but I feel that there might be a more simple way of doing it.

What can I try next?

halfer
  • 19,824
  • 17
  • 99
  • 186
Rob
  • 7,980
  • 30
  • 75
  • 115
  • You can use the function `realloc` to change the size of the array later on. – Jonathon Aug 28 '11 at 15:53
  • I'll go look up that function and try to think of how I can implement it and I'll get back to you, thanks – Rob Aug 28 '11 at 15:58

6 Answers6

14

To dynamically allocate a 2D array:

char **p;
int i, dim1, dim2;


/* Allocate the first dimension, which is actually a pointer to pointer to char   */
p = malloc (sizeof (char *) * dim1);

/* Then allocate each of the pointers allocated in previous step arrays of pointer to chars
 * within each of these arrays are chars
 */
for (i = 0; i < dim1; i++)
  {
    *(p + i) = malloc (sizeof (char) * dim2);
   /* or p[i] =  malloc (sizeof (char) * dim2); */
  }

 /* Do work */

/* Deallocate the allocated array. Start deallocation from the lowest level.
 * that is in the reverse order of which we did the allocation
 */
for (i = 0; i < dim1; i++)
{
  free (p[i]);
}
free (p);

Modify the above method. When you need another line to be added do *(p + i) = malloc (sizeof (char) * dim2); and update i. In this case you need to predict the max numbers of lines in the file which is indicated by the dim1 variable, for which we allocate the p array first time. This will only allocate the (sizeof (int *) * dim1) bytes, thus much better option than char p[dim1][dim2] (in c99).

There is another way i think. Allocate arrays in blocks and chain them when there is an overflow.

struct _lines {
   char **line;
   int n;
   struct _lines *next;
} *file;

file = malloc (sizeof (struct _lines));
file->line = malloc (sizeof (char *) * LINE_MAX);
file->n = 0;
head = file;

After this the first block is ready to use. When you need to insert a line just do:

/* get line into buffer */
file.line[n] = malloc (sizeof (char) * (strlen (buffer) + 1));
n++;

When n is LINE_MAX allocate another block and link it to this one.

struct _lines *temp;

temp = malloc (sizeof (struct _lines));
temp->line = malloc (sizeof (char *) * LINE_MAX);
temp->n = 0;
file->next = temp;
file = file->next;

Something like this.

When one block's n becomes 0, deallocate it, and update the current block pointer file to the previous one. You can either traverse from beginning single linked list and traverse from the start or use double links.

phoxis
  • 60,131
  • 14
  • 81
  • 117
  • The latter approach is how GNU's `tail(1)` implementation works, I believe, when the input file is non-seekable (such as a pipe or stdin). Since it can only make one pass through the input file in those cases, it stores the file in a linked list of memory blobs, and when it hits end-of-file, it searches backwards to print out the last `N` lines. – Adam Rosenfield Aug 28 '11 at 16:17
  • @Adam Rosenfield: Didn't know that, thanks for the information. I used this to store a long list of words in memory long ago, and it is pretty useful. – phoxis Aug 28 '11 at 16:20
7

There's no standard resizable array type in C. You have to implement it yourself, or use a third-party library. Here's a simple bare-bones example:

typedef struct int_array
{
    int *array;
    size_t length;
    size_t capacity;
} int_array;

void int_array_init(int_array *array)
{
    array->array = NULL;
    array->length = 0;
    array->capacity = 0;
}

void int_array_free(int_array *array)
{
    free(array->array);
    array->array = NULL;
    array->length = 0;
    array->capacity = 0;
}

void int_array_push_back(int_array *array, int value)
{
    if(array->length == array->capacity)
    {
        // Not enough space, reallocate.  Also, watch out for overflow.
        int new_capacity = array->capacity * 2;
        if(new_capacity > array->capacity && new_capacity < SIZE_T_MAX / sizeof(int))
        {
            int *new_array = realloc(array->array, new_capacity * sizeof(int));
            if(new_array != NULL)
            {
               array->array = new_array;
               array->capacity = new_capacity;
            }
            else
                ; // Handle out-of-memory
        }
        else
            ; // Handle overflow error
    }

    // Now that we have space, add the value to the array
    array->array[array->length] = value;
    array->length++;
}

Use it like this:

int_array a;
int_array_init(&a);

int i;
for(i = 0; i < 10; i++)
    int_array_push_back(&a, i);
for(i = 0; i < a.length; i++)
    printf("a[%d] = %d\n", i, a.array[i]);

int_array_free(&a);

Of course, this is only for an array of ints. Since C doesn't have templates, you'd have to either put all of this code in a macro for each different type of array (or use a different preprocessor such as GNU m4). Or, you could use a generic array container that either used void* pointers (requiring all array elements to be malloc'ed) or opaque memory blobs, which would require a cast with every element access and a memcpy for every element get/set.

In any case, it's not pretty. Two-dimensional arrays are even uglier.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
2

Instead of an array here, you could also use a linked list, The code is simpler, but the allocation is more frequent and may suffer from fragmentation.

As long as you don't plan to do much random access (Which is O(n) here), iteration is about as simple as a regular array.

typedef struct Line Line;
struct Line{
    char text[LINE_MAX];
    Line *next;
};

Line *mkline()
{
    Line *l = malloc(sizeof(Line));
    if(!l)
       error();
    return l;
}

main()
{
    Line *lines = mkline();
    Line *lp = lines;
    while(fgets(lp->text, sizeof lp->text, stdin)!=NULL){
         lp->next = mkline();
         lp = lp->next;
    }
    lp->next = NULL;
}
Dave
  • 10,964
  • 3
  • 32
  • 54
1

If you are using C you will need to implement the resizing of the array yourself. C++ and the SDL has this done for you. It is called a vector. http://www.cplusplus.com/reference/stl/vector/

Ed Heal
  • 59,252
  • 17
  • 87
  • 127
1

While a multidimensional array can solve this problem, a rectangular 2D array would not really be the natural C solution.

Here is a program that initially reads the file into a linked list, and then allocates a vector of pointers of the right size. Each individual character does then appear as array[line][col] but in fact each row is only as long as it needs to be. It's C99 except for <err.h>.

#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct strnode {
  char *s;
  struct strnode *next;
} strnode;

strnode *list_head;
strnode *list_last;

strnode *read1line(void) {
  char space[1024];
  if(fgets(space, sizeof space, stdin) == NULL)
    return NULL;
  strnode *node = malloc(sizeof(strnode));
  if(node && (node->s = malloc(strlen(space) + 1))) {
    strcpy(node->s, space);
    node->next = NULL;
    if (list_head == NULL)
      list_head = node;
    else
      list_last->next = node;
    list_last = node;
    return node;
  }
  err(1, NULL);
}

int main(int ac, char **av) {
  int n;
  strnode *s;

  for(n = 0; (s = read1line()) != NULL; ++n)
    continue;
  if(n > 0) {
    int i;
    strnode *b;
    char **a = malloc(n * sizeof(char *));
    printf("There were %d lines\n", n);
    for(b = list_head, i = 0; b; b = b->next, ++i)
      a[i] = b->s;
    printf("Near the middle is: %s", a[n / 2]);
  }
  return 0;
}
DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
1

You can use the malloc and realloc functions to dynamically allocate and resize an array of pointers to char, and each element of the array will point to a string read from the file (where that string's storage is also allocated dynamically). For simplicity's sake we'll assume that the maximum length of each line is less than M characters (counting the newline), so we don't have to do any dynamic resizing of individual lines.

You'll need to keep track of the array size manually each time you extend it. A common technique is to double the array size each time you extend, rather than extending by a fixed size; this minimizes the number of calls to realloc, which is potentially expensive. Of course that means you'll have to keep track of two quantities; the total size of the array and the number of elements currently read.

Example:

#define INITIAL_SIZE ... // some size large enough to cover most cases

char **loadFile(FILE *stream, size_t *linesRead)
{
  size_t arraySize = 0;   
  char **lines = NULL;
  char *nextLine = NULL;

  *linesRead = 0;

  lines = malloc(INITIAL_SIZE * sizeof *lines);
  if (!lines)
  {
    fprintf(stderr, "Could not allocate array\n");
    return NULL;
  }

  arraySize = INITIAL_SIZE;

  /**
   * Read the next input line from the stream.  We're abstracting this
   * out to keep the code simple.
   */
  while ((nextLine = getNextLine(stream)))  
  {
    if (arraySize <= *linesRead)
    {
      char **tmp = realloc(lines, arraysSize * 2 * sizeof *tmp);
      if (tmp)
      {
        lines = tmp;
        arraySize *= 2;
      }
    }
    lines[(*linesRead)++] = nextLine;
  )

  return lines;
}
John Bode
  • 119,563
  • 19
  • 122
  • 198