Write words to a new file

Question

I have more than 3 hours can not understand the implementation of this task.

To do it:

The user must enter a sentence with separating signs (coma, dash).
Write the sentence in the file TF4_1.txt
Create a new file TF4_2 and write: List item:

        1) The word with one letter and the number of its repetition in the sentence.

        2) The word with two letters and the number of repetitions in the sentence.

        3) The word with three letters and the number of repetitions in the sentence.

Etc. up to 16.

The length of one word can not exceed 16 characters. It is necessary to use only char arrays. string doesn't allowed.

#include <fstream>
#include <iostream>
#include <cstring>

using namespace std;

const char text_separator[] = " ,./?!:();";

int smallest_srt()
{
    ifstream read_file("TF4_1.txt");
    char buff[256] = { NULL };
    read_file >> buff;
    char *token = strtok(buff, text_separator);
    int smallest = strlen(buff);
    while (token != NULL)
    {
        if (smallest > strlen(token))
            smallest = strlen(token);
        read_file.getline(buff, 256, '\n');
        token = strtok(buff, text_separator);
    }
    read_file.close();
    return smallest;
}

int main()
{
    char text_array[256];
    ofstream write_file;
    write_file.open("TF4_1.txt", ios::out | ios::trunc);
    cout << "Enter text: ";
    cin.getline(text_array, 256, '\n');
    write_file << text_array;
    write_file.close();
    int smallest = smallest_srt();
    ifstream read_file("TF4_1.txt");
    char buff[256] = { NULL };
    for (int i = smallest; i < 16; i++)
    {
        read_file.getline(buff, 256, '\n');
        char *token = strtok(buff, text_separator);
        while (token != NULL)
        {
            if (strcmp(world,token) && strlen(token) == i)
                n++;

        }
        //WRITE WORDS IN FILE; ??? (HOW?)
    return 0;
}

The code is not working and has not been fully implemented, but give at least a hint how to fix it and make the optimized code.

Result

User type: This is my test sentence. Is it fun?

I should get this in new file:

 1. is - 2 times
 2. my - 1 times
 3. it - 1 times
 4. fun - 1 times
 5. test - 1 times
 6. this - 1 times

Recommendation: unless you are required t use a character array `char buff[256]`, use a `std::string`. `std::string buff;` is much more versatile and safer to use. And while I'm here, `NULL` is not a character. Use `'\0'` instead. — user4581301, Mar 19 '18 at 22:08
yes, i know. But this is my lab work and my teacher required from me to use only `char`. @user4581301 — Den Andreychuk, Mar 19 '18 at 22:10
*"Look, the dog sat then lay down"*. Does the `the` in `then` count as an occurrence of your 3-letter word `the` in the sentence? (or are you only considering whole words?) — David C. Rankin, Mar 19 '18 at 22:14
Required to use `char` Sucks, but such is life. This means my next suggestion is pointless, so instead we should fix `strtok` The first call to `strtok` gets the buffer you want parsed. Subsequent calls should use `NULL` for the buffer. When `strtok` sees NULL it starts where the last call left off. — user4581301, Mar 19 '18 at 22:14
`read_file >> buff;` will read only one word and doesn't respect the size limits on `buff`. `read_file.getline(buff, sizeof(buff));` is probably a better fit. — user4581301, Mar 19 '18 at 22:17
`int smallest;` should probably be `size_t smallest;` to match the return type of the `strlen` calls. This triggers a bit of refactoring of the `int`s in the program to `size_t`s — user4581301, Mar 19 '18 at 22:19
@DavidC.Rankin The result in the task is described in more detail. Check it. — Den Andreychuk, Mar 19 '18 at 22:24
`read_file.getline(buff, 256, '\n');` in the while loop will lead to much pain. Recommend writing a small program that just reads the file and writes out the words found stripped of separators. Once you have that sorted out you're in a good place to move onto placing the words you've found into the right bins according to the size of the tokens. — user4581301, Mar 19 '18 at 22:24
Interesting fun fact: Your test sentence is two sentences. You should probably call the input something else to reduce potential confusion. — user4581301, Mar 19 '18 at 22:26
Are you allowed to use `std::vector` or `std::map`? Either would make your job a lot easier. — user4581301, Mar 19 '18 at 22:28
If you're being taught C-style strings before `std::string` and you're forced to use them, you're probably a victim of bad teaching and you're better off with a [good book](https://stackoverflow.com/q/388242/9254539). C-style strings are a pain in the butt to use. — eesiraed, Mar 19 '18 at 23:40
If you think it is important to tell us you have only 3 hours to fix this, you have come to the wrong place. This is not a homework help site. — Raedwald, Mar 20 '18 at 05:43

David C. Rankin · Accepted Answer · 2018-03-20T05:13:45.513

One of the most costly tasks you can do in programming is file I/O. You want to minimize the number of file opens and reads (although you do get a default file-buffer of BUFSIZ chars that helps, 8192 bytes on Linux, 512 on windoze).

The way you want to approach the task is to read read your input once, process it as required, and then write the processed information once to the required files.

Here, according to your answers to my comments, and your edit, you want to determine the number of times each word is seen (max of 16 chars per-word), write the sentence entered by the user to "TF4_1.txt" and write the word frequency to "TF4_2.txt". (the sort order is not specified, add call to qsort if specific order required)

When you think about coordinating multiple pieces of information of differing types, you should immediately think struct. For two pieces of data, you can get away with multiple arrays, but generally, an array of struct that holds the information is preferred. Here you have a word and a count you want to keep for each individual word. You could declare a simple struct to handle your storage needs as follows:

#define MAXC 1024   /* if you need constants, define them    */
#define MAXL   32   /*    (don't skimp on buffer size)       */
#define MAXW  256   /* max chars in buf, word len, no. words */
...
typedef struct {       /* struct to associate word and count */
    char word[MAXL];
    int count;
} wstat;

(a typedef is used for convenience)

The remainder of the logic is fairly standard for this type problem. You read your sentence, you tokenize the string, (in your case you convert each token to lowercase), you loop over the words you have already stored -- comparing the lowercase token to the stored word. If you find a match, you simply increment the count for that word, otherwise you copy the lowercase token to the next available element.word in your array of struct, increment the element.count and the array of struct index.

You must also take care to protect your array bounds by then comparing the index to the maximum number of elements.

When you are done processing each token, you simply write your array to "TF4_2.txt", close the file -- and you are done.

Putting it altogether, you could do something similar to the following:

#include <iostream>
#include <iomanip>
#include <fstream>
#include <cstring>
#include <cctype>

using namespace std;

#define MAXC 1024   /* if you need constants, define them    */
#define MAXL   32   /*    (don't skimp on buffer size)       */
#define MAXW  256   /* max chars in buf, word len, no. words */

#define SENTOUT "TF4_1.txt"       /* sentence out filename   */
#define STATOUT "TF4_2.txt"       /* statistics out filename */

typedef struct {       /* struct to associate word and count */
    char word[MAXL];
    int count;
} wstat;

int main (void) {

    char buf[MAXC] = "",                /* buffer to hold line */
        *p = buf;                       /* pointer to buffer */
    const char *delim = " ,./?!:();";   /* strtok delimiters */
    int wcount = 0;                     /* word count */
    wstat wstats[MAXW] = {{"", 0}};     /* word stats array */

    /* prompt for input */
    cout << "enter sentence (words 16 char or less): ";
    if (!(cin.get (buf, MAXC, '\n'))) { /* validate input */
        cerr << "error: invalid input or user canceled.\n";
        return 1;
    }
    cout << buf << "\n";                /* output to stdout (optional) */
    ofstream f(SENTOUT, ios::trunc);    /* open TF4_1.txt for writing */
    if (!f.is_open()) {                 /* validate file open for writing */
        cerr << "error: file open failed '" << SENTOUT << "'.\n";
        return 1;
    }
    f << buf << "\n";                   /* write sentence to TF4_1.txt */
    f.close();                          /* close TF4_1.txt */

    /* tokenize input */
    for (p = strtok (p, delim); p; p = strtok (NULL, delim)) {
        int seen = 0;                   /* flag if word already seen */
        char lccopy[MAXL] = "",         /* array for lower-case copy */
            *rp = p,                    /* read-pointer to token */
            *wp = lccopy;               /* write-pointer for copy */
        while (*rp)                     /* iterate over each char */
            *wp++ = tolower(*rp++);     /* convert to lowercase */
        *wp = 0;                        /* nul-terminate lccopy */
        for (int i = 0; i < wcount; i++)    /* loop over stored words */
            /* compare lccopy to stored words */
            if (strcmp (lccopy, wstats[i].word) == 0) { /* already stored */
                wstats[i].count++;      /* increment count for word */
                seen = 1;               /* set seen flag */
            }
        if (!seen) {    /* if not already seen */
            strcpy (wstats[wcount].word, lccopy);   /* copy to wstats */
            wstats[wcount++].count++;   /* increment count for word */
            if (wcount == MAXW) {       /* protect array bounds */
                cerr << "maximum words reached: " << MAXW << "\n";
                break;
            }
        }
    }

    f.open (STATOUT, ios::trunc);       /* open TF4_2.txt */
    if (!f.is_open()) {                 /* validate file open for writing */
        cerr << "error: file open failed '" << STATOUT << "'.\n";
        return 1;
    }
    for (int i = 0; i < wcount; i++) {  /* loop over stored word stats */
        /* output to stdout (optional) */
        cout << " " << left << setw(16) << wstats[i].word << 
                "   " << wstats[i].count << "\n";
        /* output to TF4_2.txt */
        f << " " << left << setw(16) << wstats[i].word << 
            "   " << wstats[i].count << "\n";
    }
    f.close();                          /* close TF4_2.txt */
}

Example Use/Output

$ ./bin/wordlenfreq
enter sentence (words 16 char or less): This is my test sentence. Is it fun?
This is my test sentence. Is it fun?
 this               1
 is                 2
 my                 1
 test               1
 sentence           1
 it                 1
 fun                1

Example TF4_1.txt

$ cat TF4_1.txt
This is my test sentence. Is it fun?

Example TF4_2.txt

$ cat TF4_2.txt
 this               1
 is                 2
 my                 1
 test               1
 sentence           1
 it                 1
 fun                1

While it is always good to master using the basic types, such as char and learn to account for the character you fill and element indexes you store, you might as well write the code in C. Which would be a simple changing of the header file names, swapping fgets or POSIX getline for cin.get, printf (or fprintf) for cout and cerr and fopen/fclose for your file stream open/close operations.

With C++, the string and vector types can make your job much easier. It would handle string and struct storage requirements as well as insuring you do not write beyond the bounds of your storage. (but note: you would still require <cstring> and strtok because C++ getline cannot delimit the string based on multiple delimiters)

Look things over and let me know if you have further questions.

Write words to a new file

1 Answers1