K&R Exercise 1-21 - Mental incomprehension

Question

The "impossible" K&R exercise.

"Write a program entab that replaces strings of blanks by the minimum number of tabs and blanks to achieve the same spacing. Use the same tab stops, say every n columns. Should n be a variable or a symbolic parameter?"

The problem I'm having is, I'm unsure about how to even do this correctly. I know it's not very explanatory, but that's pretty much the problem here. Most of the examples I've seen have counted a number of blanks, and replaced those series with a tab, but this isn't what its asking, I reckon I understand what its asking, but currently feel unable to do this.

Could anyone help :)

Edit: The code I've written so far can be found here.

Steven: Hello! managed to write some bad code :) http://codepad.org/9xESoMQ0 My question is what should my code be doing, that it isn't How should I attack this problem? — asdfg, Jun 11 '10 at 03:44
What is your understanding of the task, and how does that differ from the examples you're seeing? — Jim Lewis, Jun 11 '10 at 03:49
Jim: Thanks for responding! A tabstop unlike other characters are independent of location in a string, so a starting position of a tab, and the ending position are always constant, and doesn't rely on where the current position of text actually is. This differs from prior exercises because it requires a lot larger mental leap :) — asdfg, Jun 11 '10 at 03:55
An argument could be made that the input is assumed to consist of all blanks, which considerably simplifies the problem! But given the context (the preceding `detab` exercise) you're probably right, that's not what they're looking for. — Jim Lewis, Jun 11 '10 at 04:17

Bill Forster · Answer 1 · 2010-06-14T06:24:07.553

If your question is "What is this asking me to do?" I think I can help by paraphrasing the original question (posing the same question in a different way).

Write a program that takes as input text with spaces and produces as output visually equivalent text using tabs to the maximum extent possible.

For example, with tabstops every 8 characters, and showing spaces as '.' and tabs as '-';

input;
".foo:...bar;......#comment"
output;
".foo:-bar;-..#comment"

input;
".......-foo:.....bar;......#comment"
output;
"-foo:-.bar;-...#comment"

Write the program so that tabstop parameter n can be varied, i.e. allow values of n other than 8. Be prepared to justify your decision to make n a constant, or alternatively a variable.

Edit I had a look at your code and I think it is more complex than it needs to be. My advice is to do it a character at a time. There's no need to buffer a whole line. Maintain a column count as you read each character ('\n' resets it to zero, '\t' bumps it by 1 or more, other characters increment it). When you see a space (or tab), don't emit anything right away, start your entabbing process, emit zero or more tabs and then spaces later (at '\n' or a non whitespace character, whichever comes first).

A final hint is that a state machine can make this kind of algorithm a lot easier to write, validate, test and read.

Edit 2 In a shameless attempt to get the OP to accept my answer, I have now gone ahead and actually coded a solution myself, based on the hints I offered above and my comment in the discussion.

// K&R Exercise 1-21, entab program, for Stackoverflow.com
#include <stdio.h>
#define N 4     // Tabstop value. Todo, make this a variable, allow
                //  user to modify it using command line

int main()
{
    int col=0, base_col=0, entab=0;

    // Loop replacing spaces with tabs to the maximum extent
    int c=getchar();
    while( c != EOF )
    {

        // Normal state
        if( !entab )
        {

            // If whitespace goto entab state
            if( c==' ' || c=='\t' )
            {
                entab = 1;
                base_col = col;
            }

            // Else emit character
            else
                putchar(c);
        }

        // Entab state
        else
        {

            // Trim trailing whitespace
            if( c == '\n' )
            {
                entab = 0;
                putchar( '\n' );
            }

            // If not whitespace, exit entab state
            else if( c!=' ' && c!='\t' )
            {
                entab = 0;

                // Emit tabs to get close to current column position
                //  eg base_col=1, N=4, col=10
                //  base_col + 3 = 4 (1st time thru loop)
                //  base_col + 4 = 8 (2nd time thru loop)
                while( (base_col + (N-base_col%N)) <= col )
                {
                    base_col += (N-base_col%N);
                    putchar( '\t' );
                }

                // Emit spaces to close onto current column position
                // eg base_col=1, N=4, col=10
                //  base_col -> 8, and two tabs emitted above
                //  base_col + 1 = 9 (1st time thru this loop)
                //  base_col + 1 = 10 (2nd time thru this loop)
                while( (base_col + 1) <= col )
                {
                    base_col++;
                    putchar( ' ' );
                }

                // Emit buffered character after tabs and spaces
                putchar( c );
            }
        }

        // Update current column position for either state
        if( c == '\t' )
            col += (N - col%N); // eg col=1, N=4, col+=3
        else if( c == '\n' )
            col=0;
        else
            col++;

        // End loop
        c = getchar();
    }
    return 0;
}

The main catch is dealing with 3 blanks followed by a tab at the start of a line, or other similar sequences of interleaved blanks and tabs. You need to omit the 3 blanks from the output for any tab stop of size 4 or larger. — Jonathan Leffler, Jun 11 '10 at 04:58
@Jonathan Leffler. Good point, I have added an example that illustrates that wrinkle. — Bill Forster, Jun 11 '10 at 05:31
I understand what you're saying, but I don't understand how to articulate it in code. :( — asdfg, Jun 11 '10 at 07:28
Where you say " ... '\t' bumps it up by 1 or more, other characters increment it", could you elaborate please? — asdfg, Jun 11 '10 at 07:29
@asdfg I wonder if I can do this in a comment; Initially col=0. We read 'a'. Emit 'a' and now col=1. Read 'b'. Emit 'b' and now col=2. Read ' '. Go into entab state, set base_col=2, emit nothing and now col=3. Read '\t'. Stay in entab state, calculate new col=8 (so it increased by 5 in this case). Read ' '. Stay in entab state now col=9. Read 'c'. Exit entab state, use base_col and col to calculate how many tabs then spaces to emit, then emit '\t',' '. Then emit 'c' and increment col to col=10. So input so far "ab.-.c", output so far "ab-.c" (one space has been eliminated). Get it ? — Bill Forster, Jun 11 '10 at 22:12
@asdfg Just alerting you that I've now written the code to go with my analysis (Edit 2 in my answer) — Bill Forster, Jun 14 '10 at 21:49
Bill, I just wrote a testcase, here it is: http://codepad.org/9LKvVB6V Received a bit more help in terms of a blueprint on how it'd be done, and pretty much most of the code, but I understand it now I guess :x thanks — asdfg, Jun 15 '10 at 06:29

score 2 · Answer 2 · answered Feb 14 '18 at 22:37

I'm a bit late, but here's how I solved it myself. It's a different approach than what has been shared above, so please share any comments/feedback if you have any.

Check out my public gist on Github for the source code. There's comments on the code, and the approach is explained on the top of the file, but I'll copy and paste it here just so that the logic is clear from the get-go.

Approach:

We'll keep track of number of spaces encountered (between nontab/nonspace characters)

We'll keep track of characters (that aren't tabs/blanks/newlines) per input line

We'll evaluate the "gaps" generated by spaces by:

Evaluating whether the number of spaces in between those characters.

A gap will be "big enough" when the number of spaces is >= TABSIZE

Then, for all the left over spaces in our "buffer", we'll print them out individually

Finally, we print out the character that was read in (which was not a tab/blank)

As well as updating space count and character count if necessary.

I'm still a novice programmer in all senses, so I am not sure of how it would compare vs the other solutions posted in here, but the logic seems easier to follow (at least for me).

Hope this helps someone later on!

Elliott · Answer 3 · 2010-06-11T04:40:40.170

I agree with your assessment. It won't be enough to replace every n blanks with a tab; for example, if n == 4, "hi blank blank blank blank" should not be replaced by "hi tab", but rather by "hi tab blank blank".

It sounds like what you need to do is keep track of the current position as you're reading in each line, and use this information to determine how many tabs you need. Does this help? Please let me know if you need more details!

As for the "variable vs. symbolic parameter" part, either would definitely be viable, but I can think of one significant advantage to using a variable: you can run the program for different values of n without recompiling.

score 0 · Answer 4 · answered Nov 11 '14 at 13:42

In the top rated answer above, the program is overly complex. In an attempt to simplify that part of the answer, I've attached a much simpler code hopefully written in the style of K&R (mostly by incrementing inline with ++).

include

define TAB 4

int main(){

char newsentence[255],c;
int spacecount = 0, oldsentencepointer = 0, newsentencepointer = 0;

printf("Give me a sentence please:\n");

while ((c = getchar()) != '\n') {
    if ((oldsentencepointer != 0) && (oldsentencepointer % TAB == 0) && (spacecount > 0))
       {
        newsentencepointer -= spacecount;         //if at tabstop, and spaces and not
                                                    first, go back to 1st space, set tab.
        newsentence[newsentencepointer++] = '\t';
        spacecount = 0;
        }

    if (c == ' ') {
        newsentence[newsentencepointer++] = ' ';
        spacecount++;                       //keep track of spaces before tab stop
    }

    else if (c == '\t') {
        newsentence[newsentencepointer++] = '\t' ;
        oldsentencepointer = TAB;   //set old pointer to TAB (does not matter if actual,
                                      only cadence important)
        continue;                   //continue from here so as not to increment 
                                      old sentence counter.
        }

    else {
        newsentence[newsentencepointer++] = c ;   //write whatever was old into new.
        spacecount = 0;                           //reset space counter.
        }

    oldsentencepointer++;

}

newsentence[newsentencepointer] = '\0';    //cap it off.

puts(newsentence);

return 0;

}

score 0 · Answer 5 · answered Jun 11 '10 at 03:46

My understanding is that you don't really have to know what the problem is or how to solve it in order to answer this question. The question seems to asking whether you understand when to use variables instead of "symbolic parameters". I'm not actually sure what is meant by "symbolic parameter"; it seems to be outdated nomenclature.

Having said that, solving the first part of the question (replacing spaces with tabs) is rather straight forward. Think division and remainders.

I think symbolic parameter here would be a #define – asdfg Jun 11 '10 at 03:53 — asdfg, Jun 11 '10 at 03:53

score 0 · Answer 6 · answered Jun 11 '10 at 04:26

I took a very cursory look at your code, and nothing is jumping out at me as blatantly wrong.

So my advice would be to either single-step through a few input examples in a debugger, examining variable values as you go, or add a whole bunch of debugging print statements. In either case, your goal is to find the point where the state of the program starts to deviate from what you expected or intended.

score 0 · Answer 7 · edited Sep 04 '10 at 22:39

0

I am currently plowing KnR and came across this page:

Answers to Exercises

Your exercise are located under:

Solutions
- Chapter 1 - A Tutorial Introduction
  - Ex No 21 Pg No 34
    
    users.powernet.co.uk/eton/kandr2/krx121.html

Hopefully you find this useful.

Sincerely, Morpfh

1: http://users.powernet.co.uk/eton/kandr2/index.html "The C Programming Language", 2nd edition, Kernighan and Ritchie - Answers to Exercises

edited Sep 04 '10 at 22:39

jpalecek

47,058
7
102
144

answered Sep 04 '10 at 22:20

Morpfh

1

1

Yet another resource. Some notes to the book divided into sections by chapter. Seems to be some good points here - the most important being : Read the book carefully. Important things are stated without going on and on about it. http://www.eskimo.com/~scs/cclass/krnotes/top.html ; Notes to Accompany The C Programming Language, by Kernighan and Ritchie (``K&R'') – Morpfh Sep 06 '10 at 00:37

score 0 · Answer 8 · answered Nov 13 '16 at 03:29

There is an even more concise solution, although it does not employ the best code practices available (abusing short circuit evaluation, awkward control flow via continue, somewhat weird "space" loop).

#include <stdio.h>

#define TS 8

int main(int arg, char *argv[]) {
    int counter = 0, space_counter = 0, c;
    while ((c = getchar()) != EOF) {
        ++counter;
        if (c == ' ' && ++space_counter && (counter % TS) == 0) {
            space_counter = 0;
            c = '\t';
        } else if (c == '\t') {
            counter = space_counter = 0;
        } else if (c != ' ') {
            while (space_counter--)
                putchar(' ');
            space_counter = 0;
            if (c == '\n')
                counter = 0;
        } else {
            continue; /* don't call putchar(c) */
        }
        putchar(c);
    }
    return 0;
}

Except for blanks, every character that is read is printed verbatim. Blanks are counted instead. If the program encounters a non-blank character, it prints as many blanks as it has counted before, resetting that counter afterwards. If it encounters a blank, it checks via a second counter (printed characters since the beginning of the line/last tabstop) if the cursor is on a tabstop. If it is, a tab is printed, otherwise the blank is just counted.

A tab in the input is dealt with resetting the space counter and outputting the tab, eliminating any superfluous blanks in the process.

K&R Exercise 1-21 - Mental incomprehension

8 Answers8

include

define TAB 4