Why fgets() simplifies things (see below for non-fgets() solution)
As @user3386109 indicated, when you want to read a line of input, then you need to use a line-oriented input function such as fgets()
or POSIX getline()
to read an entire line worth of words at a time.
You can then split the buffer filled by fgets()
into tokens using strtok()
. Here you want to split words based on delimiters space
or '\n'
(you can add tab
or ','
or whatever else you need as a delimiter to separate adjacent words) strtok
considers any number of sequential delimiters as a single delimiter, so you can't, for example, tokenize a line of comma-separated values, if the line contains empty-fields. (e.g. 1,2,,4,...
would be seen as 3-tokens, not 4 with one empty-field)
Another thing to understand is strtok()
modifies the string it operates on. So if you need to preserve the original, make a copy. If you need to save the tokens (such as in an array), you need to copy the tokens to that storage (you also need to allocate storage if not using an array for each token)
If you have read your line into the buffer buf
, then on the first call to strtok()
, you pass buf
as a parameter along with the delimiters to obtain the first token (word), e.g.
char *p = strtok(buf, " \n");
After the first call to strtok()
to obtain a pointer to all remaining tokens, you pass NULL
as the parameter along with the delimiters (you can change delimiters between calls), e.g.
p = strtok (NULL, " \n")
When there are no more tokens, strtok()
returns NULL
.
Putting it together in a short example that uses each token to get the length of the word to store a sum
of the characters along with keeping a count of the tokens so the average length of each token (word) can be computed, you could do something similar to the following. (note: since none of the tokens are stored for later use, there is no need to copy the tokens to an array, etc..)
#include <stdio.h>
#include <string.h>
#define MAXC 1024
int main (void) {
char buf[MAXC];
while (fgets (buf, MAXC, stdin)) {
int n = 0, sum = 0;
puts ("\ngroup:");
for (char *p = strtok(buf, " \n"); p; p = strtok (NULL, " \n")) {
puts (p);
sum += strlen (p);
n++;
}
printf ("average len: %.2f\n", (double)sum / n);
}
}
Note above, your read loop is controlled by the return of your read function itself (fgets()
).
To test, you can create a short file with multiple words per-line and redirect it as input to your program, e.g.
Example Input File
$ cat dat/groups.txt
My dog has fleas
My cat has none
Lucky feline
Example Use/Output
$ ./bin/strgroupavglen < dat/groups.txt
group:
My
dog
has
fleas
average len: 3.25
group:
My
cat
has
none
average len: 3.00
group:
Lucky
feline
average len: 5.50
You can count characters a confirm the correct average length is computed. Look things over and let me know if you have further questions.
Without fgets() -- Using a State-Loop
Without fgets()
, it's time to drop back to good old character-oriented input and a State Loop.
What's a State Loop?
In simple terms it is simple a loop over every item of input where you keep track of the state of whatever conditions need to be tracked to separate/isolate whatever information you need out of the collection as a whole. In this case, you want to collect characters into words and collect words into lines (groups) so you can sum the total characters and output the average per-line.
To keep track of the state of things, you will simply use state variables otherwise known as flags. (a simple int
variable set to 1
or 0
for true
or false
is all you need to keep track of a state/condition)
What conditions do you need to keep track of? You need to know if you are:
- in a word or in the space before, between or after a word (
int inword;
)
- you need to know if you are in a line, so you can catch the end of lien (
int in_line;
-- you can't use inline
-- keyword and all...)
With those two states (or conditions), you can loop over every character of input, and keep track of length, sum, word_count
, etc.. with a few counter variables and do exactly what you need (the old way -- manually) You can make you job easier by including the ctype.h
header and making use of the isspace()
macro to determine if the character read is whitespace -- otherwise it is a character in a word.
Putting it altogether, you could do:
#include <stdio.h>
#include <ctype.h>
#define MAXC 128
int main (void) {
char word[MAXC]; /* array to hold word */
int c, /* char to read from stdin */
in_line = 0, /* state variable in/out line */
inword = 0, /* state varaible in/out word */
len = 0, /* word length */
n = 0, /* word count per-line */
sum = 0; /* sum of chars in words per-line */
while ((c = getchar()) != EOF) { /* read chars until EOF */
if (isspace (c)) { /* if space */
if (inword) { /* if inword */
word[len] = 0; /* nul-terminate word */
puts (word); /* output word */
n++; /* increment word-count */
sum += len; /* add length to sum */
if (c == '\n') { /* if \n, output average */
printf ("average len: %.2f\n", (double)sum / n);
in_line = 0; /* reset in_line flag */
n = 0; /* set word count 0 */
sum = 0; /* set sum zero */
}
}
len = 0; /* set length 0 if space of any kind */
}
else { /* if not space */
if (!in_line) { /* if not in_line */
puts ("\ngroup:"); /* output group heading */
in_line = 1; /* set in_line flag 1 */
}
word[len++] = c; /* add char to word, increment len */
inword = 1; /* set inword flag 1 */
}
}
if (n && inword) /* if word count and inword (non-POSIX EOF) */
printf ("average len: %.2f\n", (double)sum / n);
}
(note: the final if (n && inword)
outputs the final average in case the file doesn't contain a final '\n'
)
(output is the same)
Without fgets() -- Using scanf()
Rounding out your options, you can also use scanf()
. This is generally not a first-choice for new C programmers, as scanf()
is full of pitfalls both in the use of the format-string and in accounting for characters left in stdin
depending on the conversion specifier used.
In order to make scanf()
work in your case, you must have a way to identify where one line ends. Otherwise, you won't be able to sum the characters per-line and output the average length. You can use a form of "%s%c"
to read a word and the character that follows ("%s"
stopping when it encounters the first whitespace or EOF
). The " %[..]"
form doesn't provide any benefit here as you would simply have to negate the character class with [^..]
and include the whitespace as what not to read.
Unless you redo some of the logic of the State-Loop example above to track where you are in a line, the twist to reading input and producing the same output using scanf()
has to do with controlling a heading for each group and only outputting the heading when there are words that follow. To accommodate this you an pre-read the character at the beginning of the line and ensure it isn't a '\n'
or EOF
and then ungetc()
the character to put it back in stdin
.
Putting the pieces together with scanf()
, you could do:
#include <stdio.h>
#include <string.h>
#define MAXC 128
int main (void) {
char c = 0; /* test character */
while ((c = getchar()) != EOF && c != '\n') {
char buf[MAXC]; /* declare buffer */
int n = 0, sum = 0; /* count and sum */
ungetc (c, stdin); /* put the char back */
puts ("\ngroup:"); /* output groups header */
while (1) { /* loop until no word or until \n */
int rtn = scanf (" %s%c", buf, &c); /* read word & char */
if (rtn < 1) /* if no word, break */
break;
puts (buf); /* output word */
sum += strlen (buf); /* add length to sum */
n++; /* increment word count */
if (c == '\n') /* if end of line, break */
break;
}
if (n) /* only average if words in group */
printf ("average len: %.2f\n", (double)sum / n);
}
}
(output is the same for the same input)
So you can do it without fgets()
-- but I'll let you determine for yourself which simplifies the logic needed. Let me know if you have further questions.
Edit Requested To Provide Storage For Groups of Strings using fgets()
While the parsing of tokes (words) with fgets()/strtok()
is simple, and providing storage for each word is simple (with strdup
if you have it, or just allocating strlen() + 1
bytes), when you want to store groups of allocated strings in a single object -- you must allocate storage for (1) a pointer for each group (2) a pointer for each string in each group, and (3) storage for each word.
You rapidly find you will have to use and allocate for 3-levels of pointer indirection. The general rule is that being a 3-Star Programmer isn't a compliment -- and if you find yourself attempting char ***pointer;
, you should generally think of refactoring or rearranging your code to avoid one of the levels of indirection. However, for cases like this where you want to handle an unknown number of groups, each group containing an unknown number of strings, and each string containing an unknown number of characters, you don't have much of a choice.
Before looking at how to handle the allocations, let's look at a diagram that outlines the different allocations that will be needed, and the approach to making the last pointer for the group pointers as well as the string pointers for each groups NULL
(providing a Sentinel NULL). This frees you from having to keep counters for the number of groups and strings per-group elsewhere in order to be able to iterate over your collection and get the information back out.
For example:
Allocation 1 Allocation 2
group pointers string pointers
+------+ +------+------+------+------+------+
| g1 | ---> | s1 | s2 | s3 | s4 | NULL |
+------+ +------+------+------+------+------+
| g2 | ... | | | |
+------+ +---+ +---+ ... ....
| g3 | ... | M | | d |
+------+ +---+ +---+
| NULL | | y | | o | Allocations 3+
+------+ +---+ +---+ storage for each string
| \0| | g |
+---+ +---+
| \0|
+---+
If this is your first exposure to handling dynamic allocation, then the complexity of allocating storage for 3-levels of indirection will seem daunting, but it is actually the no more difficult that allocating/reallocating for any object -- the only twist is that you are nesting things 3-levels deep. So let's look at a the normal allocation/reallocation for a single level.
If you need to allocate storage for a string and you don't know how long the string is before hand, you can simply allocate some initial number of characters, and keep track of the number of characters used, and when used + 1 == allocated
, (+1 saves room for the '\0'
at the end), you realloc
more storage and keep going. A simple example using getchar()
to add an unknown number of character to a string can be:
#include <stdio.h>
#include <stdlib.h>
int main (void) {
int c, allocated = 2, used = 0; /* char, bytes-allocated, bytes-used */
char *string = NULL; /* pointer to allocated storage */
string = malloc (allocated); /* allocate initial storage for 2-chars */
if (string == NULL) { /* validate EVERY allocation */
perror ("malloc-string");
return 1;
}
while ((c = getchar()) != '\n' && c != EOF)
{
/* check if reallocation needed */
if (used + 1 == allocated) { /* recall +1 is needed for \0 at end */
/* always realloc to a temporary pointer */
void *tmp = realloc (string, 2 * allocated);
if (tmp == NULL) { /* validate EVERY reallocation */
perror ("realloc-string");
break; /* storage pointed to by string still good, don't exit */
}
string = tmp; /* assign reallocated block to string */
allocated *= 2; /* update allocated with new size */
}
string[used++] = c; /* assign character to string */
}
string[used] = 0; /* nul-terminate string */
printf ("%s\nallocated: %d\nused : %d\n", string, allocated, used);
free (string); /* don't forget to free what you allocate */
}
Example Use/Output
$ echo "My dog has fleas and my cat has none" | ./bin/reallocgetchar
My dog has fleas and my cat has none
allocated: 64
used : 36
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ echo "My dog has fleas and my cat has none" | valgrind ./bin/reallocgetchar
==4893== Memcheck, a memory error detector
==4893== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4893== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==4893== Command: ./bin/reallocgetchar
==4893==
My dog has fleas and my cat has none
allocated: 64
used : 36
==4893==
==4893== HEAP SUMMARY:
==4893== in use at exit: 0 bytes in 0 blocks
==4893== total heap usage: 8 allocs, 8 frees, 5,246 bytes allocated
==4893==
==4893== All heap blocks were freed -- no leaks are possible
==4893==
==4893== For counts of detected and suppressed errors, rerun with: -v
==4893== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
The approach is the same regardless, the only difference when you add levels of indirection is you are reallocating storage for pointers not characters (or integers, or struct, or whatever your base element may be) The approach is:
- Allocate some initial number of elements worth of storage, and keep track of the number of elements used and the number allocated.
- Check if reallocation is required, when
(used == allocated)
- If so, always
realloc
to a temporary pointer - when realloc
fails, it returns NULL
and if you are blindly doing pointer = realloc (pointer, ..)
you have just overwritten the address held by pointer
with NULL
creating a memory leak.
- Validate EVERY allocation/reallocation
- On successful reallocation, assign the reallocated block of memory to your pointer, update the variable tracking the number of elements allocated, and keep going....
With that as a background, we can apply that approach 3-levels deep to read all strings into a group (each group having type char **
, an allocated block of pointers where allocated storage for each string is assigned to each pointer) and tie the each group of strings together though a final allocated block of pointers (of type char***
) with the address for the allocated block of pointers for each group assigned to a pointer within this final block of memory allocated.
Below the number of groups allocated and used are tracked through the variables grpsalloced
and grpsused
where grpsused
become the index for the groups in our collection, e.g. groups[grpsused]
. For the string within each group, we track the pointers allocated and used with stralloced
and strused
with the strused
variable becoming the index to our string within the group, e.g. group[grpsused][strused]
. The [..]
operates as a dereference of the pointer just as '*'
does.
So to store each word parsed by strtok()
as a string in a groups and collect all groups together into a final object you can pass as a parameter or loop over independent of the read-loop, you could do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NGROUPS 2 /* initial no. of groups to allocate */
#define MAXC 1024 /* max characters for read buffer */
int main (void) {
char buf[MAXC];
char ***groups = NULL; /* being a 3-star programmer isn't a compliment */
/* but it is necessary to access groups of strings */
int grpsalloced = NGROUPS, /* initial no. of groups (pointers) allocated */
grpsused = 0; /* counter to track groups (pointers) used */
groups = malloc (grpsalloced * sizeof *groups); /* allocate ngroups pointers */
if (groups == NULL) { /* validate EVERY allocation */
perror ("malloc-groups");
return 1;
}
/*
* add groups of strings, reallocating as needed, ensure last pointer to
* string in each group is NULL, and ensure last pointer to group is NULL
*/
while (fgets (buf, MAXC, stdin)) { /* read each line into buffer */
int stralloced = NGROUPS, /* reuse NGROUPS to set no. strings per-group */
strused = 0; /* strings used in group */
/* allocate ptrs for current group, assign to next available groups pointer */
groups[grpsused] = malloc (stralloced * sizeof *groups[grpsused]);
if (!groups[grpsused]) { /* validate group allocation */
perror ("malloc-groups[groupsused]");
break; /* again, break, don't exit, any prior groups data still good */
}
/* loop separating tokens (words) from buffer */
for (char *p = strtok(buf, " \n"); p; p = strtok (NULL, " \n")) {
size_t len = strlen (p);
/* allocate storage for each token (word), use strdup if available */
groups[grpsused][strused] = malloc (len + 1);
if (!groups[grpsused][strused]) {
perror ("malloc-groups[groupsused][n]");
break;
}
/* copy string to allocated storage */
memcpy (groups[grpsused][strused], p, len + 1);
strused++; /* increment string count */
/* check if more pointers for current group required */
if (strused == stralloced) {
void *tmp = realloc (groups[grpsused], /* realloc str ptrs */
2 * stralloced * sizeof *groups[grpsused]);
if (!tmp) { /* validate reallocation */
perror ("realloc-groups[groupsused]");
break;
}
groups[grpsused] = tmp; /* assign new block of pointers */
stralloced *= 2; /* increment allocated pointer count */
}
groups[grpsused][strused] = NULL; /* sentinel NULL at end str ptrs */
}
grpsused++; /* increment groups used counter */
if (grpsused == grpsalloced) { /* when groups reallocation needed */
/* always realloc to a temporary pointer, here doubling no. of pointers */
void *tmp = realloc (groups, 2 * grpsalloced * sizeof *groups);
if (!tmp) {
perror ("realloc-groups");
break; /* don't exit, original data for groups still valid */
}
groups = tmp; /* assign reallocated block to groups */
grpsalloced *= 2; /* update no. of group ptrs allocated */
}
}
groups[grpsused] = NULL; /* sentinel NULL at end of group pointers */
/*
* iterate over group pointers, iterate over each string pointer in group
*/
char ***g = groups; /* pointer to groups */
while (*g) {
char **s = *g; /* pointer to 1st group */
int n = 0, sum = 0; /* integers for string counter and sum */
puts ("\ngroup:"); /* output heading */
while (*s) { /* loop over each string pointer in group */
puts (*s); /* output string */
sum += strlen (*s); /* add length to sum */
free (*s); /* free storage for string (can be done later) */
s++; /* advance to next string */
n++; /* increment string counter */
}
printf ("average len: %.2f\n", (double)sum / n); /* group result */
free (*g); /* free current group (can be done later) */
g++; /* advance to next group pointer */
}
free (groups); /* free memory for group pointers */
}
You can re-order he reallocations before the assignments, or after the grpsused
and strused
values are incremented (as done above to ensure an empty pointer is available for the sentinel-NULL)
This will complete the edits for you here -- further questions (outside what was done above) warrant a new question. (as this probably should have been)