To start, you have several problems that will invoke Undefined Behaviour. In
char *line = (char*) malloc(sizeof(text));
sizeof (text)
is the size of a pointer (char *
), not the length of the buffer it points to.
sizeof (char *)
depends on your system, but is very likely to be 8
(go ahead and test this: printf("%zu\n", sizeof (char *));
, if you are curious), which means line
can hold a string of length 7
(plus the null-terminating byte).
Long sentences will easily overflow this buffer, leading to UB.
(Aside: do not cast the return of malloc
in C.)
Additionally, strlen(text)
may not work properly as text
may not include the null-terminating byte ('\0'
). fread
works with raw bytes, and does not understand the concept of a null-terminated string - files do not have to be null-terminated, and fread
will not null-terminate buffers for you.
You should allocate one additional byte to in the read_file
function
text = malloc(num_bytes + 1);
text[num_bytes] = 0;
and place the null-terminating byte there.
(Aside: sizeof (char)
is guaranteed to be 1
.)
Note that ftell
to determine the length of a file should not be relied upon.
isspace
from <ctype.h>
can be used to determine if the current character is whitespace. Its argument should be cast to unsigned char
. Note this will include characters such as '\t'
and '\n'
. Use simple comparison if you only care about spaces (text[i + 1] == ' '
).
A loop can be used to consume the trailing whitespace after matching a delimiter.
Make sure to null-terminate line
before printing it, as %s
expects a string.
Use %u
to print an unsigned int
.
Do not forget to free
your dynamically allocated memory when you are done with it. Additionally, heavily consider checking any library function that can fail has not done so.
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void pdie(const char *msg) {
perror(msg);
exit(EXIT_FAILURE);
}
char *read_file(char *name) {
FILE *file = fopen(name, "r");
if (!file)
pdie(name);
fseek(file, 0, SEEK_END);
long num_bytes = ftell(file);
if (-1 == num_bytes)
pdie(name);
fseek(file, 0, SEEK_SET);
char *text = malloc(num_bytes + 1);
if (!text)
pdie("malloc");
if (-1 == num_bytes)
pdie(name);
text[num_bytes] = 0;
if (fread(text, 1, num_bytes, file) != num_bytes)
pdie(name);
fclose(file);
return text;
}
int main(int argc, char **argv) {
if (argc < 2) {
fprintf(stderr, "usage: %s TEXT_FILE\n", argv[0]);
return EXIT_FAILURE;
}
char *text = read_file(argv[1]);
unsigned int count = 0;
size_t length = strlen(text);
size_t index = 0;
char *line = malloc(length + 1);
if (!line)
pdie("malloc");
for (size_t i = 0; i < length; i++) {
line[index++] = text[i];
if (text[i] == '.' || text[i] == '?' || text[i] == '!') {
line[index] = '\0';
index = 0;
printf("[%u] <<%s>>\n", ++count, line);
while (isspace((unsigned char) text[i + 1]))
i++;
}
}
free(text);
free(line);
return EXIT_SUCCESS;
}
Input file:
My name is Maria. I'm 19. Hello world! How are you?
stdout
:
[1] <<My name is Maria.>>
[2] <<I'm 19.>>
[3] <<Hello world!>>
[4] <<How are you?>>