String Formatting using C / C++

Question

Recently I was asked in an interview to convert the string "aabbbccccddddd" to "a2b3c4d5". The goal is to replace each repeated character with a single occurrence and a repeat count. Here 'a' is repeated twice in the input, so we have to write it as 'a2' in the output. Also I need to write a function to reverse the format back to the original one (e.g. from the string "a2b3c4d5" to "aabbbccccddddd"). I was free to use either C or C++. I wrote the below code, but the interviewer seemed to be not very happy with this. He asked me to try a smarter way than this.

In the below code, I used formatstring() to eliminate repeated chars by just adding the repeated count and used reverseformatstring() to convert back to the original string.

void formatstring(char* target, const char* source) {
  int charRepeatCount = 1;
  bool isFirstChar = true;
  while (*source != '\0') {
    if (isFirstChar) {
      // Always add the first character to the target
      isFirstChar = false;
      *target = *source;
      source++; target++;
    } else {
      // Compare the current char with previous one,
      // increment repeat count
      if (*source == *(source-1)) {
        charRepeatCount++;
        source++;
      } else {
        if (charRepeatCount > 1) {
          // Convert repeat count to string, append to the target
          char repeatStr[10];
          _snprintf(repeatStr, 10, "%i", charRepeatCount);
          int repeatCount = strlen(repeatStr);
          for (int i = 0; i < repeatCount; i++) {
            *target = repeatStr[i];
            target++;
          }
          charRepeatCount = 1; // Reset repeat count
        }
        *target = *source;
        source++; target++;
      }
    }
  }
  if (charRepeatCount > 1) {
    // Convert repeat count to string, append it to the target
    char repeatStr[10];
    _snprintf(repeatStr, 10, "%i", charRepeatCount);
    int repeatCount = strlen(repeatStr);
    for (int i = 0; i < repeatCount; i++) {
      *target = repeatStr[i];
      target++;
    }
  }
  *target = '\0';
}

void reverseformatstring(char* target, const char* source) {
  int charRepeatCount = 0;
  bool isFirstChar = true;
  while (*source != '\0') {
    if (isFirstChar) {
      // Always add the first character to the target
      isFirstChar = false;
      *target = *source;
      source++; target++;
    } else {
      // If current char is alpha, add it to the target
      if (isalpha(*source)) {
        *target = *source;
        target++; source++;
      } else {
        // Get repeat count of previous character
        while (isdigit(*source)) {
          int currentDigit = (*source) - '0';
          charRepeatCount = (charRepeatCount == 0) ?
              currentDigit : (charRepeatCount * 10 + currentDigit);
          source++;
        }
        // Decrement repeat count as we have already written
        // the first unique char to the target
        charRepeatCount--; 
        // Repeat the last char for this count
        while (charRepeatCount > 0) {
          *target = *(target - 1);
          target++;
          charRepeatCount--;
        }
      }
    }
  }
  *target = '\0';
}

I didn't find any issues with above code. Is there any other better way of doing this?

Remember that this is probably the interviewer's favourite question and has probably been thinking about it for months. — Bathsheba, Oct 26 '13 at 13:09
Apart from the weird indentation style, I find nothing bad. I think I'd have done this pretty much the same way. — , Oct 26 '13 at 13:10
@H2CO3, sorry for wrong indentation and thanks for your reply. — Prabu, Oct 26 '13 at 13:14
Also, since you have (supposedly) no errors in your code, I think this question may be a better fir for [CodeReview.SE]. — , Oct 26 '13 at 13:14
@Prabu No, I don't think cross-posting is a good idea. Leave it now as-is, that's just a suggestion for next time you consider asking a similar question. — , Oct 26 '13 at 13:17
If I asked this in interview, I'd be impressed by a candidate who offered to code the reverse solution first. This is because it's simpler (aside from knowing how large to make the target string) and provides a *testing framework* for the more difficult compression. — Bathsheba, Oct 26 '13 at 13:17
See [this](http://rosettacode.org/wiki/Run-length_encoding#C.2B.2B) too — P0W, Oct 26 '13 at 14:22

score 7 · Answer 1 · 2013-10-26T13:31:18.967

7

The approach/algorithm is fine, perhaps you could refine and shrink the code a bit (by doing something simpler, there's no need to solve this in an overly complex way). And choose an indentation style that actually makes sense.

A C solution:

void print_transform(const char *input)
{
    for (const char *s = input; *s;) {
        char current = *s;
        size_t count = 1;
        while (*++s == current) {
            count++;
        }

        if (count > 1) {
            printf("%c%zu", current, count);
        } else {
            putc(current, stdout);
        }
    }

    putc('\n', stdout);
}

(This can be easily modified so that it returns the transformed string instead, or writes it to a long enough buffer.)

A C++ solution:

std::string transform(const std::string &input)
{
    std::stringstream ss;
    std::string::const_iterator it = input.begin();

    while (it != input.end()) {
        char current = *it;
        std::size_t count = 1;
        while (++it != input.end() && *it == current) {
            count++;
        }

        if (count > 1) {
            ss << current << count;
        } else {
            ss << current;
        }
    }

    return ss.str();
}

edited Oct 26 '13 at 13:31

answered Oct 26 '13 at 13:20

Thanks for such a simple way. – Prabu Oct 26 '13 at 13:25
1

@Prabu Also see the C++ solution. (And always remember: there's only one thing that's more important than readability and simplicity, and that is correctness. Everything else (e. g. performance, "clever"-ness, etc.) has a lower precedence than readability. Go for the naive implementation whenever you can. – Oct 26 '13 at 13:32
1

Naive implementation is fine. As long as you use the idioms of the language. Don't write C code and claim that this is good C++. Also simple allows you to talk about it and suggest improvements. If the interviewer wants an optimized version you can always do that on a second iteration. – Martin York Oct 26 '13 at 13:48
@LokiAstari I didn't claim that the C version is good C++, that's why I added a C++ one. But yes, the other points are right! – Oct 26 '13 at 15:26
@H2CO3: Not pointing at your code. Just making a point about interviewing in general. :-) – Martin York Oct 26 '13 at 18:31
@LokiAstari Sorry, misunderstood you :-) – Oct 26 '13 at 20:08

score 7 · Answer 2 · edited May 23 '17 at 12:21

Since several others have suggested very reasonable alternatives, I'd like to offer some opinions on what I think is your underlying question: "He asked me to try a smarter way than this.... Is there any other better way of doing this?"

When I interview a developer, I'm looking for signals that tell me how she approaches a problem:

Most important, as H₂CO₃ noted, is correctness: will the code work? I'm usually happy to overlook small syntax errors (forgotten semicolons, mismatched parens or braces, and so on) if the algorithm is sensible.
Proper use of the language, especially if the candidate claims expertise or has had extensive experience. Does he understand and use idioms appropriately to write straightforward, uncomplicated code?
Can she explain her train of thought as she formulates her solution? Is it logical and coherent, or is it a shotgun approach? Is she able and willing to communicate well?
Does he account for edge cases? And if so, does the intrinsic algorithm handle them, or is everything a special case? Although I'm happiest if the initial algorithm "just works" for all cases, I think it's perfectly acceptable to start with a verbose approach that covers all cases (or simply to add a "TODO" comment, noting that more work needs to be done), and then simplifying later, when it may be easier to notice patterns or duplicated code.
Does she consider error-handling? Usually, if a candidate starts by asking whether she can assume the input is valid, or with a comment like, "If this were production code, I'd check for x, y, and z problems," I'll ask what she would do, then suggest she focus on a working algorithm for now and (maybe) come back to that later. But I'm disappointed if a candidate doesn't mention it.
Testing, testing, testing! How will the candidate verify his code works? Does he walk through the code and suggest test cases, or do I need to remind him? Are the test cases sensible? Will they cover the edge cases?
Optimization: as a final step, after everything works and has been validated, I'll sometimes ask the candidate if she can improve her code. Bonus points if she suggests it without my prodding; negative points if she spends a lot of effort worrying about it before the code even works.

Applying these ideas to the code you wrote, I'd make these observations:

Using const appropriately is a plus, as it shows familiarity with the language. During an interview I'd probably ask a question or two about why/when to use it.

The proper use of char pointers throughout the code is a good sign. I tend to be pedantic about making the data types explicit within comparisons, particularly during interviews, so I'm happy to see, e.g. while (*source != '\0') rather than the (common, correct, but IMO less careful) while(*source).

isFirstChar is a bit of a red flag, based on my "edge cases" point. When you declare a boolean to keep track of the code's state, there's often a way of re-framing the problem to handle the condition intrinsically. In this case, you can use charRepeatCount to decide if this is the first character in a possible series, so you won't need to test explicitly for the first character in the string.

By the same token, repeated code can also be a sign that an algorithm can be simplified. One improvement would be to move the conversion of charRepeatCount to a separate function. See below for an even better solution.

It's funny, but I've found that candidates rarely add comments to their code during interviews. Kudos for helpful ones, negative points for those of the ilk "Increment the counter" that add verbosity without information. It's generally accepted that, unless you're doing something weird (in which case you should reconsider what you've written), you should assume the person who reads your code is familiar with the programming language. So comments should explain your thought process, not translate the code back to English.

Excessive levels of nested conditionals or loops can also be a warning. You can eliminate one level of nesting by comparing each character to the next one instead of the previous one. This works even for the last character in the string, because it will be compared to the terminating null character, which won't match and can be treated like any other character.

There are simpler ways to convert charRepeatCount from an int to a string. For example, _snprintf() returns the number of bytes it "prints" to the string, so you can use
target += _snprintf(target, 10, "%i", charRepeatCount);

In the reversing function, you've used the ternary operator perfectly ... but it's not necessary to special-case the zero value: the math is the same regardless of its value. Again, there are also standard utility functions like atoi() that will convert the leading digits of a string into an integer for you.

Experienced developers will often include the increment or decrement operation as part of the condition in a loop, rather than as a separate statement at the bottom: while(charRepeatCount-- > 0). I'd raise an eyebrow but give you a point or two for humor and personality if you wrote this using the slide operator: while (charRepeatCount --> 0). But only if you'd promise not to use it in production.

Good luck with your interviewing!

Thanks for taking time and giving valuable feedback on my code. Really useful for intermediate developers like me. — Prabu, Oct 26 '13 at 16:47
"When I interview a developer, I'm looking for signals that tell me how SHE approaches a problem" - do you ever interview male developers? :) — kotlomoy, Oct 26 '13 at 19:15
+1, extremely useful answer. Loved it! It's always good to hear from someone who is on the other side of the fence. This is very good and valuable advice. I'm going to store this for future reference ;) — Filipe Gonçalves, Oct 26 '13 at 19:17
@kotlomoy You'll notice I used both "he" and "she" in my examples. Unfortunately, I still interview far more male than female developers because there's still a huge disparity in STEM careers. In almost 2 decades I've noticed absolutely no consistent difference in ability between the sexes. — Adam Liss, Oct 26 '13 at 22:59

Filipe Gonçalves · Answer 3 · 2013-10-26T17:42:16.810

5

I think your code is too complex for the task. Here's my approach (using C):

#include <ctype.h>
#include <stdio.h>

void format_str(char *target, char *source) {
    int count;
    char last;
    while (*source != '\0') {
        *target = *source;
        last = *target;
        target++;
        source++;
        for (count = 1; *source == last; source++, count++)
            ; /* Intentionally left blank */
        if (count > 1)
            target += sprintf(target, "%d", count);
    }
    *target = '\0';
}

void convert_back(char *target, char *source) {
    char last;
    int val;
    while (*source != '\0') {
        if (!isdigit((unsigned char) *source)) {
            last = *source;
            *target = last;
            target++;
            source++;
        }
        else {
            for (val = 0; isdigit((unsigned char) *source); val = val*10 + *source - '0', source++)
                ; /* Intentionally left blank */
            while (--val) {
                *target = last;
                target++;
            }
        }
    }
    *target = '\0';
}

format_str compresses the string, and convert_back uncompresses it.

edited Oct 26 '13 at 17:42

answered Oct 26 '13 at 13:37

Filipe Gonçalves

20,783
6
53
70

Thanks for yet another simpler solution. – Prabu Oct 26 '13 at 13:46
I have a small doubt. In format_str, in for loop, you are incrementing source pointer without checking for end ('\0'). Is this not needed really here ? – Prabu Oct 26 '13 at 14:13
@Prabu: It happens to work in this case because `*source == last` will be false when `*source == '\0'`. – Vaughn Cato Oct 26 '13 at 14:27
Exactly: due to the `while` loop condition, you are guaranteed that `last` is never `'\0'`. If `source` reaches `'\0'`, then `*source == last` evaluates to false and both loops (`for` and `while`) stop. – Filipe Gonçalves Oct 26 '13 at 14:29
@Prabu Note that in my C implementation, I don't do the check *inside* the loop. That is not a problem, since `while (*++s == current)` will take care of not running out of bounds. And comparing/accessing the NUL terminator is fine. It's part of the string. In the C++ implementation, though, I have to go with `(++it != input.end() && *it == current)`, because dereferencing the end iterator is undefined behavior. – Oct 26 '13 at 15:28
@FilipeGonçalves, Thanks for clarification and it is now very clear for me. – Prabu Oct 26 '13 at 16:29

score 0 · Answer 4 · answered Oct 26 '13 at 13:18

0

Your code "works", but it doesn't adhere to some common patterns used in C++. You should have:

used std::string instead of plain char* array(s)
pass that string as const reference to avoid modification, since you write the result somewhere else;
use C++11 features such as ranged based for loops and lambdas as well.

I think the interviewer's purpose was to test your ability to deal with the C++11 standard, since the algorithm itself was pretty trivial.

answered Oct 26 '13 at 13:18

Polentino

907
5
16

Will try this and let you know. – Prabu Oct 26 '13 at 13:26
1

In an interview that's what I am looking for. Especially for junior devs; These algorithms we ask you to write are designed to be simple enough that you can do multiple versions in an hour and chat about what you did and why between iterations. – Martin York Oct 26 '13 at 13:45
2

-1 OP said: "I was free to use either C or C++." So "I think the interviewer's purpose was to test your ability to deal with the C++11 standard" is wrong – kotlomoy Oct 26 '13 at 14:02
@kotlomoy: You are taking that too literally. If the answer had been in C++ then you want to check it adhered to C++ style and idioms. If the answer had been C then you want to use C style and idioms. Either way the interviewer is checking to see if the candidate knows how to use the language correctly (and this includes standard styles of usage). – Martin York Oct 26 '13 at 14:42
@LokiAstari "If the answer had been C then you want to use C style and idioms." - exactly. OP's answer was in C, so answer "Your code "works", but it doesn't adhere to some common patterns used in C++" is not relevant. Also assumption about interviewer's purpose is plain wrong. Hence the downvote. – kotlomoy Oct 26 '13 at 16:11

score 0 · Answer 5 · answered Oct 26 '13 at 13:57

Perhaps the interviewer wanted to test your knowledge of existing standard library tools. Here's how my take could look in C++:

#include <string>
#include <sstream>
#include <algorithm>
#include <iostream>

typedef std::string::const_iterator Iter;

std::string foo(Iter first, Iter last)
{
    Iter it = first;
    std::ostringstream result;
    while (it != last) {
        it = std::find_if(it, last, [=](char c){ return c != *it; });
        result << *first << (it - first);
        first = it;
    }
    return result.str();    
}

int main()
{
    std::string s = "aaabbbbbbccddde";
    std::cout << foo(s.begin(), s.end());
}

An extra check is needed for empty input.

score 0 · Answer 6 · answered Oct 26 '13 at 14:05

0

try this

std::string str="aabbbccccddddd";

for(int i=0;i<255;i++)
{
    int c=0;
    for(int j=0;j<str.length();j++)
    {
        if(str[j] == i)
            c++;
    }
    if(c>0)
    printf("%c%d",i,c);
}

answered Oct 26 '13 at 14:05

kunal

956
9
16

Apparently, this is not what OP needs – kotlomoy Oct 26 '13 at 19:13

score 0 · Answer 7 · answered Oct 26 '13 at 21:03

My naive approach:

void pack( char const * SrcStr, char * DstBuf ) {

    char const * Src_Ptr = SrcStr;
    char * Dst_Ptr = DstBuf;

    char c = 0;
    int RepeatCount = 1;

    while( '\0' != *Src_Ptr ) {

        c = *Dst_Ptr = *Src_Ptr;
        ++Src_Ptr; ++Dst_Ptr;

        for( RepeatCount = 1; *Src_Ptr == c; ++RepeatCount ) {
            ++Src_Ptr;
        }

        if( RepeatCount > 1 ) {
            Dst_Ptr += sprintf( Dst_Ptr, "%i", RepeatCount );
            RepeatCount = 1;
        }
    }

    *Dst_Ptr = '\0';
};

void unpack( char const * SrcStr, char * DstBuf ) {

    char const * Src_Ptr = SrcStr;
    char * Dst_Ptr = DstBuf;

    char c = 0;

    while( '\0' != *Src_Ptr ) {

        if( !isdigit( *Src_Ptr ) ) {
            c = *Dst_Ptr = *Src_Ptr;
            ++Src_Ptr; ++Dst_Ptr;

        } else {
            int repeat_count = strtol( Src_Ptr, (char**)&Src_Ptr, 10 );
            memset( Dst_Ptr, c, repeat_count - 1 );
            Dst_Ptr += repeat_count - 1;
        }
    }

    *Dst_Ptr = '\0';
};

But if interviewer asks for error-handling than solution turns to be much more complex (and ugly). My portable approach:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>

// for MSVC
#ifdef _WIN32
    #define snprintf sprintf_s
#endif

int pack( char const * SrcStr, char * DstBuf, size_t DstBuf_Size ) {

    int Err = 0;

    char const * Src_Ptr = SrcStr;
    char * Dst_Ptr = DstBuf;

    size_t SrcBuf_Size = strlen( SrcStr ) + 1;
    char const * SrcBuf_End = SrcStr + SrcBuf_Size;
    char const * DstBuf_End = DstBuf + DstBuf_Size;

    char c = 0;
    int RepeatCount = 1;

    // don't forget about buffers intercrossing
    if( !SrcStr || !DstBuf || 0 == DstBuf_Size \
        || (DstBuf < SrcBuf_End && DstBuf_End > SrcStr) ) {

        return 1;
    }

    // source string must contain no digits
    // check for destination buffer overflow
    while( '\0' != *Src_Ptr && Dst_Ptr < DstBuf_End - 1 \
        && !isdigit( *Src_Ptr ) && !Err ) {

        c = *Dst_Ptr = *Src_Ptr;
        ++Src_Ptr; ++Dst_Ptr;

        for( RepeatCount = 1; *Src_Ptr == c; ++RepeatCount ) {
            ++Src_Ptr;
        }

        if( RepeatCount > 1 ) {
            int res = snprintf( Dst_Ptr, DstBuf_End - Dst_Ptr - 1, "%i" \
                , RepeatCount );
            if( res < 0 ) {
                Err = 1;
            } else {
                Dst_Ptr += res;
                RepeatCount = 1;
            }
       }
    }

    *Dst_Ptr = '\0';

    return Err;
};

int unpack( char const * SrcStr, char * DstBuf, size_t DstBuf_Size ) {

    int Err = 0;

    char const * Src_Ptr = SrcStr;
    char * Dst_Ptr = DstBuf;

    size_t SrcBuf_Size = strlen( SrcStr ) + 1;
    char const * SrcBuf_End = SrcStr + SrcBuf_Size;
    char const * DstBuf_End = DstBuf + DstBuf_Size;

    char c = 0;

    // don't forget about buffers intercrossing
    // first character of source string must be non-digit
    if( !SrcStr || !DstBuf || 0 == DstBuf_Size \
        || (DstBuf < SrcBuf_End && DstBuf_End > SrcStr) || isdigit( SrcStr[0] ) ) {

        return 1;
    }

    // check for destination buffer overflow
    while( '\0' != *Src_Ptr && Dst_Ptr < DstBuf_End - 1 && !Err ) {

        if( !isdigit( *Src_Ptr ) ) {
            c = *Dst_Ptr = *Src_Ptr;
            ++Src_Ptr; ++Dst_Ptr;

        } else {
            int repeat_count = strtol( Src_Ptr, (char**)&Src_Ptr, 10 );
            if( !repeat_count || repeat_count - 1 > DstBuf_End - Dst_Ptr - 1 ) { 
                Err = 1;
            } else {
                memset( Dst_Ptr, c, repeat_count - 1 );
                Dst_Ptr += repeat_count - 1;
            }
        }
    }

    *Dst_Ptr = '\0';

    return Err;
};

int main() {

    char str[] = "aabbbccccddddd";
    char buf1[128] = {0};
    char buf2[128] = {0};

    pack( str, buf1, 128 );
    printf( "pack: %s -> %s\n", str, buf1 );

    unpack( buf1, buf2, 128 );
    printf( "unpack: %s -> %s\n", buf1, buf2 );

    return 0;
}

Test: http://ideone.com/Y7FNE3. Also works in MSVC.

score 0 · Answer 8 · answered Oct 26 '13 at 22:12

Try to make do with less boilerplate:

#include <iostream>
#include <iterator>
#include <sstream>
using namespace std;

template<typename in_iter,class ostream>
void torle(in_iter i, ostream &&o)
{
        while (char c = *i++) {
                size_t n = 1;
                while ( *i == c )
                        ++n, ++i;
                o<<c<<n;
        }
}

template<class istream, typename out_iter>
void fromrle(istream &&i, out_iter o)
{
        char c; size_t n;
        while (i>>c>>n)
                while (n--) *o++=c;
}

int main()
{
    typedef ostream_iterator<char> to;
    string line; stringstream converted;
    while (getline(cin,line)) {
        torle(begin(line),converted);
        cout<<converted.str()<<'\n';
        fromrle(converted,ostream_iterator<char>(cout));
        cout<<'\n';
    }
}

String Formatting using C / C++

8 Answers8