2

I'm trying to read a .txt file and save all sentences end with .!? into array. I use getline and strtok to do this. When I save the sentences, it seems work. But when I try to retrieve data later through index, the first line is missing.

The input is in a file input.txt with content below

The wandering earth! In 2058, the aging Sun? is about to turn into a red .giant and threatens to engulf the Earth's orbit!

Below is my code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main() {
    FILE *fp = fopen("input.txt", "r+");
    char *line = NULL;
    size_t len = 0;
    char *sentences[100];

    if (fp == NULL) {
        perror("Cannot open file!");
        exit(1);
    }

    char delimit[] = ".!?";
    int i = 0;

    while (getline(&line, &len, fp) != -1) { 
        char *p = strtok(line, delimit);
        
        while (p != NULL) {
            sentences[i] = p;
            printf("sentences [%d]=%s\n", i, sentences[i]);
            i++;
            p = strtok(NULL, delimit);
        }
    }

    for (int k = 0; k < i; k++) {
        printf("sentence is ----%s\n", sentences[k]);
    }

    return 0;
}

output is

sentences [0]=The wandering earth
sentences [1]= In 2058, the aging Sun
sentences [2]= is about to turn into a red 
sentences [3]=giant and threatens to engulf the Earth's orbit
sentence is ----
sentence is ---- In 2058, the aging Sun
sentence is ---- is about to turn into a red 
sentence is ----giant and threatens to engulf the Earth's orbit

I use strtok to split string directly. It worked fine.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Yadan Wei
  • 23
  • 3
  • 2
    Have you tried setting `line = NULL;` before looping to use `getline()` a second time? You want it to allocate a fresh buffer for each line loaded. – Fe2O3 Feb 01 '23 at 04:49
  • I didn't use getline in the second loop. – Yadan Wei Feb 01 '23 at 04:58
  • No you didn't. I can see that. I'm talking about "resetting" the pointer to NULL at the bottom of the first loop... – Fe2O3 Feb 01 '23 at 05:02
  • I tried as you suggested, but the first line is still missing. – Yadan Wei Feb 01 '23 at 05:13
  • I've just downloaded your code. Why do use an UPPERCASE 'I' here: `printf("sentences [%d]=%s\n", i, sentences[I]);` This suggests that what you've posted in your question is not the code you are compiling/executing. – Fe2O3 Feb 01 '23 at 05:18
  • I copied it from my code and didn't notice that error, now I change it back to lowercase. It is the version I am running. – Yadan Wei Feb 01 '23 at 05:23
  • Perhaps if you could publish the source text AS IT IS... There's something fishy about not seeing newlines that `getline()` should leave in the buffer, and strange leading SP on some lines that don't appear to be the effects of using `strtok()`... – Fe2O3 Feb 01 '23 at 05:32
  • I think I know the reason. It's because char *line is a pointer to a string, and strtok cannot change it like what this post says. https://stackoverflow.com/questions/4090434/strtok-char-array-versus-char-pointer, after I strdup(line), it works. – Yadan Wei Feb 01 '23 at 05:34
  • You've got it working. Good. The problem is the reference you mention discusses the difference between an "**Immutible** String Literal" and an ordinary character array (as would be allocated in the bowels of `getline()`... `strdup()` should not be necessary, but... I'll leave you to it... – Fe2O3 Feb 01 '23 at 05:38
  • 1
    @YadanWei `while (p != NULL)` loop should also test `i`. Example `while (p != NULL && i < 100)` – chux - Reinstate Monica Feb 01 '23 at 05:40
  • Yeah, I actually don't know why it works, if it needs a char array, it shouldn't work at the beginning instead of missing one line. I tried two text files, but the same issue. Thank you for taking the time to solve it. – Yadan Wei Feb 01 '23 at 05:44
  • @chux-ReinstateMonica `fopen(... "r+")` should be "r", perhaps "rt". Delimiters should also include `'\n'`, maybe `'\r'`... There's a few small troubles, yet... `:-)` Let's not talk about the memory leaking... `:-)` – Fe2O3 Feb 01 '23 at 05:47
  • @chux-ReinstateMonica I added what you say, the issue is still there. My input file is very short less than 100 sentences. Currently, I change `char *p = strtok(line, delimit);`to `char *p = strtok(strdup(line), delimit);`, it works. – Yadan Wei Feb 01 '23 at 05:51
  • Please edit question with the matching input to the output you provided. – Allan Wind Feb 01 '23 at 05:56
  • Is there an empty line after the line you showed in your text file? I assume your first line is just the result of an empty line read with second iteration of the loop as you are only using one string buffer for all lines. – Gerhardh Feb 01 '23 at 07:11

1 Answers1

2
  1. Change mode from "r+" to "r".
  2. Changed the list of delimiters from a variable to a constant DELIMITERS and added '\n'. You may or may not what that '\n' in there but I would need to see the expected output now that you supplied input. vim, at least, ends the last line with a '\n' which would generate at least one '\n' token at the end. The other option is to remove leading and trailing white space, and if you end up with an empty string then don't add it as a sentence.
  3. Introduced a constant for number of sentences, and ignore additional sentences beyond what we have space for.
  4. Combined the two strtok() calls (DRY).
  5. Eliminated the two memory leaks.
  6. If your input contains multiple lines the contents of line will be overwritten. This means the pointers in in sentences no longer make sense. The easiest fix is strdup() each string. Another approach would be to retain an array of line pointers (for subsequent free()) and have getline() allocate new a new line each time by resetting line = 0 and line = NULL.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define DELIMITERS ".!?\n"
#define SENTENCES_LEN 100

int main() {
    FILE *fp = fopen("input.txt", "r");
    if (!fp) {
        perror("Cannot open file!");
        return 1;
    }

    char *line = NULL;
    size_t len = 0;
    char *sentences[SENTENCES_LEN];
    int i = 0;
    while (getline(&line, &len, fp) != -1) { 
        char *s = line;
        for(; i < SENTENCES_LEN; i++) {
            char *sentence = strtok(s, DELIMITERS);
            if(!sentence)
                break;
            sentences[i] = strdup(sentence);
            printf("sentences [%d]=%s\n", i, sentences[i]);
            s = NULL;
        }
    }

    for (int k = 0; k < i; k++) {
        printf("sentence is ----%s\n", sentences[k]);
        free(sentences[k]);
    }

    free(line);
    fclose(fp);
}

Using the supplied input file the matching out is:

sentences [0]=The wandering earth
sentences [1]= In 2058, the aging Sun
sentences [2]= is about to turn into a red
sentences [3]=giant and threatens to engulf the Earth's orbit
sentence is ----The wandering earth
sentence is ---- In 2058, the aging Sun
sentence is ---- is about to turn into a red
sentence is ----giant and threatens to engulf the Earth's orbit
Allan Wind
  • 23,068
  • 5
  • 28
  • 38