1

I've been working through K&R and am trying to solve exercise 5-13, which states

Write the program tail, which prints the last n lines of its input. By default, n is 10, say, but it can be changed by an optional argument, so that

tail -n

prints the last n lines. The program should behave rationally no matter how unreasonable the input or the value of n. Write the program so it makes the best use of available storage; lines should be stored as in the sorting program of Section 5.6, not in a two-dimensional array of fixed size.

Here's my algorithm

  • If argc == 1 then set n = 10 else n is the second argument
  • Dynamically create an array of character pointers of size n. This will hold the pointers to the lines that have to be printed.
  • Call readlines. Readlines starts taking input from the user until EOF is encountered. Every time a newline is encountered, that's a line and malloc is called with the length of that line as the argument. This returns a pointer that is cast as a character pointer and the dynamically created array of pointers holds this pointer according to this - array_of_pointers[nlines % n] (where nlines is the number of the current line).
  • After all the lines have been read, readlines return nlines.
  • writelines is called with n (the command line argument), nlines and the array of pointers as it's arguments and it prints the lines accordingly - j = nlines - n; j < nlines; j++ and the (j % n)th pointer to character in the array of pointers is printed.

Here's the code I wrote for this

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXLEN 1000

int my_getline(char line[], int maxline)
{
  int c, n = 0;

  while (((c = getchar()) != EOF) || (c != '\n')) 
    line[n++] = c;
  if (c == '\n')
    line[n++] = c;
  line[n] = '\0';

  return n;
}

int readlines(int n, char **pa)
{
  int len, nlines = -1;
  char *p, line[MAXLEN];

  nlines = 0;
  while ((len = my_getline(line, MAXLEN)) > 0) {
    if ((p = (char *) malloc(len)) == NULL)
      return -1;
    else {
      line[len-1] = '\0';
      strcpy(p, line);

      pa[++nlines % n] = p;
    }
  }
  return nlines;
}

void writelines(char **pa, int n, int nlines)
{
  int j;

  for (j = nlines - n; j < nlines; j++) {
    printf("%s\n", *pa[j % n]);
  }
}

int main(int argc, char *argv[])
{
  int n, nlines;
  char **pa;

  (argc == 1) ? (n = 10) : (n = atoi(*++argv));
  pa = (char *) malloc(n * sizeof(char*));
  nlines = readlines(n, &pa);
  writelines(&pa, n, nlines);

  free(pa);
  return 0;
}

I have two problems

  1. Clearly somewhere my interpretation of pa (the array of pointers) is wrong, because I get a ton of errors when I pass pa to readlines and writelines and try to write to it. How do I fix these?
  2. I know that after you're done, you're supposed to free your memory. I can free the memory of the array of pointers (pa) using free(pa), but how would one go about freeing the memory of p in readlines. The question states I should make the "best use of available storage" that means after I have read n lines, I should ideally free the 1st line when the 11th line is, 2nd line when the 12th line is read and so on. But I don't know how to do this.

Sorry in advance. I'm a newbie with C and this pointers to pointers business along with dynamic memory allocation is really muddling up my brain.

  • `pa[++lines % n] = p;` hmm... I can't find the variable `lines` - where is it declared? – Support Ukraine Apr 24 '20 at 06:28
  • `line[line-1] = '\0';` hmm... the array is used as index into the array. Sure that is what you wanna do? – Support Ukraine Apr 24 '20 at 06:29
  • 2
    Well -- of course you do. `nlines = readlines(n, &pa)` where `pa` is already `char**` so by sending `&pa` you are sending `char***` (becoming a 3-Star Programmer) Also, in C, there is no need to cast the return of `malloc`, it is unnecessary. See: [Do I cast the result of malloc?](http://stackoverflow.com/q/605845/995714) Why are you not using `fgets()` fo read the lines? – David C. Rankin Apr 24 '20 at 06:29
  • @4386427 Damn. Stupid typos. My bad. I meant nlines and len respectively. Edited. – Apoorva Anand Apr 24 '20 at 06:33
  • `pa = (char *) malloc(n * sizeof(char*));` Well, you cast `malloc` to char pointer but `pa` is not a char pointer - it's a pointer to char pointer. So you cast should be `(char**)`. However, **in C you don't cast malloc** - so simply remove the casting. – Support Ukraine Apr 24 '20 at 06:33
  • When you call `readlines` and `writelines` you pass `&pa`. OK but `&pa` is a `char***` (3 starts) but the functions expects `char**` (2 starts). As far as I can see, you should on pass `pa` – Support Ukraine Apr 24 '20 at 06:37
  • @DavidC.Rankin Thanks for the tip on malloc. I should've read the chapter on malloc instead of googling it. I might've gotten confused there. I'll remove the cast. I am using my_getline mostly because I have not encountered fgets yet in the book. – Apoorva Anand Apr 24 '20 at 06:38
  • @ApoorvaAnand So the above comments tells one thing. You have ignored all warnings from the compiler. **Don't ignore compiler warnings!!** The first thing to do is to turn the compiler warning level up (for gcc at least use -Wall) and then make sure you get a clean compile, i.e. **no warnings**. Don't even run the program before it compiles cleanly. – Support Ukraine Apr 24 '20 at 06:40
  • 1
    Take a look at this: `while (((c = getchar()) != EOF) || (c != '\n'))` Question: Which return value from `getchar` will make the logical expression false? – Support Ukraine Apr 24 '20 at 06:42
  • I don't quite understand how doing &pa makes it a 3 star pointer. Can you help me understand that? Point me to the right resources? In the book, whenever a pointer has been passed to a function it has been passed by using & to pass the address of the variable. – Apoorva Anand Apr 24 '20 at 06:44
  • I'm really sorry everyone. I should've error checked the rest of the program before posting it here. I am fixing them all as we speak. But for now, can you help me understand the 3 star pointer thing along with how I would free ```p```? – Apoorva Anand Apr 24 '20 at 06:48
  • 1
    @ApoorvaAnand Whenever you do `&` on some variable the type of the resulting object will have one more * than the variable, i.e. `int x; int* p = &x; int** pp = &p; int*** ppp = &pp` and so on. Your `pa` is a `char**`so `&pa` is a `char***` – Support Ukraine Apr 24 '20 at 06:49
  • So for a function to be able to change pa, I don't have to pass pa's address? I can pass pa (as long as it is a pointer) and the function's arguments are pointers without using the & at all? – Apoorva Anand Apr 24 '20 at 06:52
  • 1
    @ApoorvaAnand you ask: "So for a function to be able to change pa, I don't have to pass pa's address?" oh yes, absolutely. **IF** you want to change `pa` in a function, you must pass `&pa`. But take a look at your code - do you change `pa` ? No. You change what `pa` points to but you don't change the value of `pa` itself. Huge and very important difference. It's very fundamental in C - it's something you need to fully understand. – Support Ukraine Apr 24 '20 at 06:57
  • Thank you. I got the program to work. I think I need to read more about pointers. There's a case I need to take care of, that is, when nlines < n. I shall fix that. Thank you for helping me. Here's my program https://gist.github.com/apoorvaanand1998/be3894a3fdd1e3cb5fa63a914bd6e3ac. – Apoorva Anand Apr 24 '20 at 07:02
  • @ApoorvaAnand Your welcome :-) Make sure you study and fully understand the difference between passing a pointer to a function and passing the address of a pointer to a function. Have fun studying C – Support Ukraine Apr 24 '20 at 07:05
  • What about my second question? The freeing of ```p```? – Apoorva Anand Apr 24 '20 at 07:06
  • 1
    @ApoorvaAnand Give me 5 min and I'll post an answer for the free – Support Ukraine Apr 24 '20 at 07:10
  • @ApoorvaAnand The answer is ready – Support Ukraine Apr 24 '20 at 07:25

2 Answers2

2

Well, it's obvious you are thinking along the proper lines of logic to get to the simulated tail of lines, but you appear to be stumbling on how to approach handling the memory allocation, reallocation and freeing. (which is likely to point of the exercise).

While there is nothing that prevents you from allocating your pointers for pa in main() and passing that parameter to readlines(), that is somewhat an awkward way to do it. When you think of creating a function that will allocated storage for an object, let the function allocate for the complete object and return a pointer to the object on success, or return NULL on failure. That way the calling function knows if the function returns a valid pointer it is responsible for freeing the memory associated with the object (instead of some of the memory being allocated in different places). If the function returns NULL -- the caller knows the function failed and it need not worry about any memory for the object.

This also frees you from having to pass a parameter for the object. Since you will allocate for a complete object in the function, simply change the return type to the type of your object, (char** here) and pass a pointer-to the memory holding the number of lines to output. Why a pointer? If less than that number of lines are stored (either because the file being read has fewer lines, or you ran out of memory before storing all lines), you can update the value at that address with the actual number of lines stored and make that number available back in the caller (main() here).

With those changes, you can declare your function as:

char **readlines (int *n)
{

Within your function you need to declare a line-counter, a buffer to hold the line read from the file (which I presume your MAXLEN is for) and declare and allocate the pointers for your object, validating every allocation. For example:

    int ndx = 0;    /* line counter */
    char buf[MAXLEN], **pa = malloc (*n * sizeof *pa);  /* allocate pointers */

    if (!pa) {                      /* validate pointer allocation */
        perror ("malloc-pa");
        return pa;
    }
    for (int i = 0; i < *n; i++)    /* initialize all pointers NULL */
        pa[i] = NULL;

Note above, the pointers have all been initialized NULL which will allow both the initial allocation and any needed reallocations to be handled by realloc(). Also note that instead of using malloc for the pointers, you can use calloc which will set all bytes zero (and for all compilers I know, cause the pointers to evaluate as NULL without having to explicitly loop setting them). However that isn't guaranteed by the standard -- so a loop is proper.

Here fgets() is used to read each line and strcspn() is used to trim the '\n' and get the length of each line -- you can use whatever function you like. After the line is read, trimmed and length obtained, the memory is allocated (or reallocated) to hold the line and the line is copied to the new block of memory. Your nlines % n index is thinking correctly, but you don't increment nlines until after the allocation and assignment, e.g.

(note: Edited below to treat any line reallocation failure as terminal, and Free All Memory returning NULL as discussed in the comments with @4386427 -- needed due to cyclical use of indexes and any failure after all lines originally allocated would result in unusable partial results (non-sequential line output))

    while (fgets (buf, MAXLEN, stdin)) {        /* read each line of input */
        void *tmp;                              /* tmp to realloc with */
        size_t len;                             /* line length */
        buf[(len = strcspn (buf, "\n"))] = 0;   /* trim '\n', get length */

        /* always realloc to a temporary pointer, validate before assigning */
        if (!(tmp = realloc (pa[ndx % *n], len + 1))) {
            int rm = ndx > *n ? *n : ndx;       /* detrmine no. of lines to free */
            perror ("realloc-pa[ndx % *n]");

            while (rm--)                        /* loop freeing each allocated line */
                free (pa[rm]);
            free (pa);                          /* free pointers */

            return NULL;
        }
        pa[ndx % *n] = tmp;                     /* assign new block to pa[ndx%n] */
        memcpy (pa[ndx % *n], buf, len + 1);    /* copy line to block of memory */

        ndx++;      /* increment line count */
    }

(note: if allocation fails for any allocated line, all allocated lines are freed along with the pointers and NULL is returned avoiding any memory leak. Your continued overwriting of each pointer with a new address for each newly allocated block with continually leak memory that can no longer be freed -- you have lost the start address to the original block when you overwrite the pointer)

The final thing you do before returning your allocated object is to check if your index is less than the value for *n' and if it is, update the value at that address so the actual number of lines stored is available back in the caller, e.g.

    if (ndx < *n)   /* if less than *n lines read */
        *n = ndx;   /* update number at that address with ndx */

    return pa;      /* return allocated object */
}

That's basically it for your function. Putting it altogether with the output simply written from main(), you would have:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#define NLINES   10     /* default number of lines */
#define MAXLEN 1000     /* max characters per-line */

/* create and store last *n lines from stdin in allocated object,
 * returning pointer to object on success, and updating value at n,
 * if less than NLINES lines read. Return NULL on failure. Caller
 * is responsible for freeing allocated memory.
 */
char **readlines (int *n)
{
    int ndx = 0;    /* line counter */
    char buf[MAXLEN], **pa = malloc (*n * sizeof *pa);  /* allocate pointers */

    if (!pa) {                      /* validate pointer allocation */
        perror ("malloc-pa");
        return pa;
    }
    for (int i = 0; i < *n; i++)    /* initialize all pointers NULL */
        pa[i] = NULL;

    while (fgets (buf, MAXLEN, stdin)) {        /* read each line of input */
        void *tmp;                              /* tmp to realloc with */
        size_t len;                             /* line length */
        buf[(len = strcspn (buf, "\n"))] = 0;   /* trim '\n', get length */

        /* always realloc to a temporary pointer, validate before assigning */
        if (!(tmp = realloc (pa[ndx % *n], len + 1))) {
            int rm = ndx > *n ? *n : ndx;       /* detrmine no. of lines to free */
            perror ("realloc-pa[ndx % *n]");

            while (rm--)                        /* loop freeing each allocated line */
                free (pa[rm]);
            free (pa);                          /* free pointers */

            return NULL;
        }
        pa[ndx % *n] = tmp;                     /* assign new block to pa[ndx%n] */
        memcpy (pa[ndx % *n], buf, len + 1);    /* copy line to block of memory */

        ndx++;      /* increment line count */
    }

    if (ndx < *n)   /* if less than *n lines read */
        *n = ndx;   /* update number at that address with ndx */

    return pa;      /* return allocated object */
}

int main (int argc, char **argv) {

    char *p = NULL, **lines = NULL;         /* pointers for strtol, and lines */
    int n = argc > 1 ? (int)strtol (argv[1], &p, 0) : NLINES;

    if (n != NLINES && (errno || p == argv[1])) {   /* validate conversion */
        fprintf (stderr, "error: invalid no. of lines '%s'\n", argv[1]);
        return 1;
    }

    if (!(lines = readlines(&n))) {             /* read lines validate return */
        fputs ("error: readlines failed.\n", stderr);
        return 1;
    }

    for (int i = 0; i < n; i++) {               /* loop over each stored line */
        puts (lines[i]);                        /* output line */
        free (lines[i]);                        /* free storage for line */
    }
    free (lines);                               /* free pointers */
}

(you can add the functions you like to replace the read with fgets() and the output loop in main(), as desired).

Example Use/Output

Default behavior:

$ printf "%s\n" line{1..20} | ./bin/tail
line11
line12
line13
line14
line15
line16
line17
line18
line19
line20

Output only 5 lines instead of default:

$ printf "%s\n" line{1..20} | ./bin/tail 5
line16
line17
line18
line19
line20

Handle less than default number of lines in file:

$ printf "%s\n" line{1..5} | ./bin/tail
line1
line2
line3
line4
line5

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ printf "%s\n" line{1..20} | valgrind ./bin/tail 5
==25642== Memcheck, a memory error detector
==25642== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==25642== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==25642== Command: ./bin/tail 5
==25642==
line16
line17
line18
line19
line20
==25642==
==25642== HEAP SUMMARY:
==25642==     in use at exit: 0 bytes in 0 blocks
==25642==   total heap usage: 23 allocs, 23 frees, 5,291 bytes allocated
==25642==
==25642== All heap blocks were freed -- no leaks are possible
==25642==
==25642== For counts of detected and suppressed errors, rerun with: -v
==25642== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

Look things over and let me know if you have further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Oh wow. You dropped some major knowledge here. This is the kind of answer I will read and reread multiple times to get lots out of. It took me pretty long to even read through it once. Thank you! – Apoorva Anand Apr 24 '20 at 07:53
  • You bet, this is just the long version. @4386427 provided a very fine answer for you. – David C. Rankin Apr 24 '20 at 07:54
1

This answer only focus on this part:

how would one go about freeing the memory of p in readlines.

In principle all you need is to iterate the pointers in the memory that pa points to and free them one by one. Like

for (int i = 0; i < n; ++i) free(pa[i]);
free(pa);

However, there is one minor problem: You can't know how many of those pointers that have been assigned a malloced value in the readlines.

To get around that problem, you can initialize all the pointers to NULL. Then it is safe to call free on all pointers because it's always valid to call free with a NULL-pointer.

Like:

pa = malloc(n * sizeof(char*));           // or better: pa = malloc(n * sizeof *pa);
for (int i = 0; i < n; ++i) pa[i] = NULL; // make all pointers equal to NULL

... do your stuff ...

for (int i = 0; i < n; ++i) free(pa[i]);
free(pa);

Note: You could use calloc instead of malloc and avoid the initialization loop. However, in order to keep things simple, I continued with the malloc

That said, there is another problem here:

pa[++nlines % n] = p;

Here you overwrite the pointers that pa points to. So you may overwrite a pointer to some malloced memory - that's bad. Make sure to call free first:

int tmp = ++nlines % n;
free(pa[tmp]);            // pa[tmp] may be NULL but that is OK
pa[tmp] = p;

This solution requires the NULL initialization of the pointers that pa points to.

BTW: This code line will work

(argc == 1) ? (n = 10) : (n = atoi(*++argv));

but in my opnion it has a "smell".

I would keep it more simple like:

int n = 10;
if (argc == 2)
{
    n = atoi(argv[1]);  
}

Further, atoi isn't the best solution - see Why shouldn't I use atoi()?

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • Thank you! I didn't even know you could selectively free a part of an array with ```free(pa[tmp])```! – Apoorva Anand Apr 24 '20 at 07:30
  • 1
    @ApoorvaAnand Well, they are not really "part of an array". Your array is **an array of pointers**. Each of these pointers points to some memory **outside** the array and that memory can (and must) be free'ed one by one - just like it was `malloc`ed one by one. The "array" is free'ed by `free(pa)` – Support Ukraine Apr 24 '20 at 07:33