0

I am trying to extract key, values from a text file, but I am having trouble determining how to locate the end of a value. Here is a short snippet of the text file.

GIRRAFE: A tall spotted animal
LION: A short carnivore.
Prince: The son of a king.
Princess: The daughter of a king.

This is my code:

FILE *fp;
char line[20], word[20];
int i = 0, endind;

  fp = fopen(file, "r");
  if (fp==NULL){
    printf("Error parsing the file\n");
    exit(1);
  }
while (!feof(fp)){
  fgets(line, 100, fp);
      for (i;i<strlen(line);i++){
        if (line[i]=='.'){
          endind = i;
        }
      }
      for (i;i<endind;i++){
        word[i] = line[i];
          printf("%s\n",word);
      }


}

The code is not very good as Im not able to get a value ending with a complete blank newline.

Anthony
  • 1
  • 6
  • 1
    Unrelated, [don't do that: `while (!feof(fp))`](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – WhozCraig Jun 01 '18 at 21:20
  • but even if I did that it doesn't solve storing a key and value, as thats the main issue – Anthony Jun 01 '18 at 21:48
  • 1
    Thus the carefully chosen use of the word, *"Unrelated"* – WhozCraig Jun 01 '18 at 21:49
  • That your `line` is declared as a 20 byte string, but you're willing to read 100 characters into it via your `fgets` call seems not a good thing? – Travis Griggs Jun 01 '18 at 22:34
  • Please do not vandalise your posts. Once you have submitted a post, you have licensed the content to the Stack Overflow community at large ([under the CC BY-SA license](https://creativecommons.org/licenses/by-sa/3.0/)). By SE policy, any vandalism will be reverted. – NobodyNada Jun 02 '18 at 21:47

2 Answers2

0

From the sample data, it looks like the key ends at the first '.' in the string. Use strchr(3) to find it. But it looks like the value, and the whole item, ends with two newlines. For that you will need to write code to read a paragraph into a string. For that, malloc(3) and realloc(3) will be useful. If you have a known maximum size, you can of course use a fixed size buffer.

Break the problem into parts. First, read a paragraph, then find where the key ends, then find where the value starts. Decide if the two newlines are part of the value, and whether the period is part of the key.

To read a paragraph, read in a line. If the line is empty, which you can determine with strcmp(line, "\n") then you're done reading the value, and you can move on. Otherwise, append the line to the paragraph buffer.

Once you've got a whole paragraph as a single string, find the end of the key with char *keyend = strchr(para, '.'), which will return a pointer to the '.' character. You can replace that character with a null (*keyend = 0) and now para is a string with the key. Next advance the keyend pointer to the first non-whitespace character. There are several ways to do that. At this point, keyend will now point to the value. Which gives you para as a pointer to the key, and keyend as a pointer to the value. Having that, you can update your hash table.

I would also check for errors along the way, and probably use separate variables better named for the paragraph, key, and value. Trimming off the trailing newline and other data validation is optional. For example, what if a paragraph doesn't contain a '.' character at all?

user464502
  • 2,203
  • 11
  • 14
  • That was a great explanation thank you for going into so much detail and taking the time to do so. I have been using strtok to find a '.' which returns my key, but using strtok(null, "\n") stops at the values first newline, is there a way I could determine if a newline is empty – Anthony Jun 02 '18 at 02:24
0

You are on the right track. The simple way to determine if you have an empty line (in your case) is:

fgets(line, 100, fp);
if (*line == '\n')
    // the line is empty

(note: if (line[0] == '\n') is equivalent. In each case you are simply checking whether the 1st char in line is '\n'. Index notation of line[x] is equivalent to pointer notation *(line + x), and since you are checking the 1st character, (e.g. x=0), pointer notation is simply *line)

While you are free to use strtok or any other means to locate the 1st '.', using strchr() or simply using a pointer to iterate (walk-down) the buffer until you find the first '.' is probably an easier way to go. Your parsing flow should look something like:

readdef = 0;  // flag telling us if we are reading word or definition
offset = 0;   // number of chars copied to definition buffer

read line {

    if (empty line (e.g. '\n')) {  // we have a full word + definition
        add definition to your list
        reset readdef flag = 0
        reset offset = 0
    }
    else if (readdef == 0) {  // line with word + 1st part of definiton
        scan forward to 1st '.'
        check number of chars will fit in word buffer
        copy to word buffer (or add to your list, etc..)
        scan forward to start of definition (skip punct & whitespace)
        get length of remainder of line (so you can save offset to append)
        overwrite \n with ' ' to append subsequent parts of definition
        strcpy to defn (this is the 1st part of definition)
        update offset with length
        set readdef flag = 1
    }
    else {  // we are reading additional lines of definition
        get length of remainder of line (so you can save offset to append)
        check number of chars will fit in definition buffer
        snprintf to defn + offset (or you can use strcat)
        update offset with length
    }
}

add final defintion to list

The key is looping and handling the different states of your input (either empty-line -- we have a word + full definition, readdef = 0 we need to start a new word + definition, or readdef = 1 we are adding lines to the current definition) You can think of this as a state loop. You are simply handling the different conditions (or states) presented by your input file. Note -- you must add the final definition after your read-loop (you still have the last definition in your definition buffer when fgets returns EOF)

Below is a short example working with your data-file. It simply outputs the word/definition pairs -- where you would be adding them to your list. You can use any combination of strtok, strchr or walking a pointer as I do below to parse the data file into words and definitions. Remember, if you ever find a problem where you can't make strtok fit your data -- you can always walk a pointer down the buffer comparing each character as you go and responding as required to parse your data.

You can also use snprintf or strcat to add the multiple lines of definitions together (or simply a pointer and a loop), but avoid strncpy, especially for large buffers -- it has a few performance penalties as it zeros the unused space every time.

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAXW  128   /* max chars in word or phrase */
#define MAXC 1024   /* max char for read buffer and definition */

int main (int argc, char **argv) {

    int readdef = 0;        /* flag for reading definition */
    size_t offset = 0,      /* offset for each part of definition */
        len = 0;            /* length of each line */
    char buf[MAXC] = "",    /* read (line) buffer */
        word[MAXW] = "",    /* buffer storing word */
        defn[MAXC] = "";    /* buffer storing definition */
    /* open filename given as 1st argument, (or read stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    while (fgets (buf, MAXC, fp)) { /* read each line */

        char *p = buf;      /* pointer to parse word & 1st part of defn */

        if (*buf == '\n') {     /* empty-line, output definition */
            defn[offset-1] = 0; /* remove trailing ' ' left for append */
            printf ("defn: %s\n\n", defn);
            readdef = 0;        /* reset readdef flag - 0 */
            offset = 0;         /* reset offset - 0 */
        }
        else if (readdef == 0) {    /* line contais word + 1st part of defn */
            while (*p && *p != '.') /* find the first '.' */
                p++;
            if (p - buf + 1 > MAXW) {   /* make sure word fits in word */
                fprintf (stderr, "error: word exceeds %d chars.\n", MAXW - 1);
                return 1;
            }
            snprintf (word, p - buf + 1, "%s", buf);    /* copy to word */
            printf ("word: %s\n", word);                /* output word */
            while (ispunct (*p) || isspace (*p))   /* scan to start of defn */
                p++;
            len = strlen (p);               /* get length 1st part of defn */
            if (len && p[len - 1] == '\n')  /* chk \n, overwrite with ' ' */
                p[len - 1] = ' ';
            strcpy (defn, p);       /* copy rest of line to defn */
            offset += len;          /* update offset (no. of chars in defn) */
            readdef = 1;            /* set readdef flag - 1 */
        }
        else {                      /* line contains next part of defn */
            len = strlen (buf);                 /* get length */
            if (len && buf[len - 1] == '\n')    /* chk \n, overwite w/' ' */
                buf[len - 1] = ' ';
            if (offset + len + 1 > MAXC) {      /* make sure it fits */
                fprintf (stderr, "error: definition excees %d chars.\n",
                        MAXC - 1);
                return 1;
            }
            snprintf (defn + offset, len + 1, "%s", buf);   /* append defn */
            offset += len;  /* update offset */
        }
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    defn[offset-1] = 0;     /* remove trailing ' ' left for append */
    printf ("defn: %s\n\n", defn);      /* output final definition */

    return 0;
}

Example Input File

$ cat dat/definitions.txt
ACTE. A peninsula; the term was particularly applied by the ancients to
the sea-coast around Mount Athos.

ACT OF COURT. The decision of the court or judge on the verdict, or the
overruling of the court on a point of law.

TELEGRAPH, TO. To convey intelligence to a distance, through the medium
of signals.

TELESCOPIC OBJECTS. All those which are not visible to the unassisted
eye.

TELL OFF, TO. To divide a body of men into divisions and subdivisions,
preparatory to a special service.

TELL-TALE. A compass hanging face downwards from the beams in the cabin,
showing the position of the vessel's head. Also, an index in front of
the wheel to show the position of the tiller.

Example Use/Output

$ /bin/read_def <dat/definitions.txt
word: ACTE
defn: A peninsula; the term was particularly applied by the ancients to the sea-coast around Mount Athos.

word: ACT OF COURT
defn: The decision of the court or judge on the verdict, or the overruling of the court on a point of law.

word: TELEGRAPH, TO
defn: To convey intelligence to a distance, through the medium of signals.

word: TELESCOPIC OBJECTS
defn: All those which are not visible to the unassisted eye.

word: TELL OFF, TO
defn: To divide a body of men into divisions and subdivisions, preparatory to a special service.

word: TELL-TALE
defn: A compass hanging face downwards from the beams in the cabin, showing the position of the vessel's head. Also, an index in front of the wheel to show the position of the tiller.

Look things over and let me know if you have further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • I'm glad you got your code running. Please take a look at: [What should I do when someone answers my question?](http://stackoverflow.com/help/someone-answers) – David C. Rankin Jun 03 '18 at 01:51