0

Say I want to create a String that will hold some values based on another string. Basically, I want to be able to compress one string, like this: aaabb -> a3b2 - But my question is:

In Java you could do something like this:

String mystr = "";
String original = "aaabb";

char last =  original.charAt(0);
for (int i = 1; i < original.length(); i++) {
  // Some code not relevant
  mystr += last + "" + count; // Here is my doubt.
}

As you can see, we have initialized an empty string and we can modify it (mystr += last + "" + count;). How can you do that in C?

Jason C
  • 38,729
  • 14
  • 126
  • 182
StarTrek18
  • 323
  • 4
  • 15
  • You're not __modifying__ the string. Java strings are immutable. You are creating a new string and pointing mystr to the new string. C strings are just character arrays (can be modified) – avmohan Mar 30 '14 at 23:24
  • 1
    It can be done with sprintf and strcat but get prepared: for string manipulation, C is a pain in the neck. Can't you use C++? – gd1 Mar 30 '14 at 23:24
  • If you were using C++ you could use `std::string` or `std::stringstream`. Note that both of these are mutable, whereas in Java, strings are not mutable; but in your sample code, it would be the same end effect. – Jason C Mar 30 '14 at 23:25
  • Why do C++ people complain about people using C APIs in C++ and claim they're completely independent languages, but then answer C questions with C++ answers? ;-) – James M Mar 30 '14 at 23:26
  • I don't understand. If they are not mutable, how does this work? You are actually modifying mystr. – StarTrek18 Mar 30 '14 at 23:26
  • 1
    @StarTrek18 No. You are creating a new `String` and storing the reference to that new string in `mystr`. That's why e.g. `StringBuilder` can sometimes offer performance gains (although in many cases the compiler will drop in a `StringBuilder` behind the scenes for string appends anyways). – Jason C Mar 30 '14 at 23:27
  • @JasonC Compiler drops in StringBuilder automatically?? ooh, i didn't know that. – avmohan Mar 30 '14 at 23:27
  • Oh ok. So, there is no easy way of doing this in C? – StarTrek18 Mar 30 '14 at 23:28
  • With the standard library, not really. You can use a string library that makes this easier, or do what most C programs do and allocate a buffer big enough for all the concatenation you're going to do before you use it (then you can use `strcat` or similar.) – James M Mar 30 '14 at 23:29
  • 1
    @StarTrek18 It's easy to do it in C, depending on your definition of "easy". The simplest is probably to set up a fixed size buffer and use `strcat`, as already mentioned. You could also allocate a buffer as needed and use, e.g., `realloc()` to grow it. – Jason C Mar 30 '14 at 23:29
  • I am not telling C++ is "better" than C, but I want to save this person from the daunting inferno of string rocket science in C. :) – gd1 Mar 30 '14 at 23:31
  • 1
    @v3ga I meant `StringBuffer`, sorry, but yes. See http://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.18.1 and also http://stackoverflow.com/questions/1532461/stringbuilder-vs-string-concatenation-in-tostring-in-java – Jason C Mar 30 '14 at 23:33
  • You can pre-determine an upper bound for the length of the output string, and then allocate that in advance of parsing the input string. For example "ab" maps to "a1b1" (longer than original) but "aab" maps to "a2b" (same length as original) and "aaab" maps to "a3b" (shorter than original). It looks like the worst case would be an input string where each subsequent character was different to the previous one and in this case the output length would be exactly twice the input length. So pre-allocate a string of length 2*strlen(input) + 1. – jarmod Mar 30 '14 at 23:36
  • possible duplicate of [C String Concatenation](http://stackoverflow.com/questions/308695/c-string-concatenation) – Jason C Mar 30 '14 at 23:41

3 Answers3

4

Unfortunately, in C you cannot have it as easy as in Java: string memory needs dynamic allocation.

There are three common choices here:

  1. Allocate as much as you could possibly need, then trim to size once you are done - This is very common, but it is also risky due to a possibility of buffer overrun when you miscalculate the max
  2. Run your algorithm twice - the first time counting the length, and the second time filling in the data - This may be the most efficient one if the timing is dominated by memory allocation: this approach requires you to allocate only once, and you allocate the precise amount of memory.
  3. Allocate as you go - start with a short string, then use realloc when you need more memory.

I would recommend using the second approach. In your case, you would run through the source string once to compute the compressed length (in your case, that's 5 - four characters for the payload "a3b2", and one for the null terminator. With this information in hand, you allocate five bytes, then use the allocated buffer for the output, which is guaranteed to fit.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • I love the two pass approach (measure then fill). I used this in my JSON parser and it's consistently winning benchmarks - presumably because it never has to reallocate anything. – James M Mar 30 '14 at 23:32
0

In C (not C++) you can do something like this:

char mystr[1024];
char * str = "abcdef";
char c = str[1];  // will get 'b'
int int_num = 100;
sprintf(mystr, "%s%c%d", str, c, int_num);

This will create a string in 'mystr':

"abcdefb100"

You can then concatenate more data to this string using strcat()

strcat(mystr, "xyz"); // now it is "abcdefb100xyz"

Please note that mystr has been declared to be 1024 bytes long and this is all the space you can use in it. If you know how long your string will be you can use malloc() in C to allocate the space and then use it.

C++ has much more robust ways of dealing with strings, if you want to use it.

DNT
  • 2,356
  • 14
  • 16
0

You can use string concatenation method strcat: http://www.cplusplus.com/reference/cstring/strcat/

You define your string as following:

char mystr[1024]; // Assuming the maximum string you will need is 1024 including the terminating zero

To convert the character last into a string to be able to concatenate it, you use the following syntax:

char lastString[2];
lastString[0] = last; // Set the current character from the for loop
lastString[1] = '\0'; // Set the null terminator

To convert the count into a string you need to use itoa function as following:

char countString[32];
itoa (count, countString, 10); // Convert count to decimal ascii string

Then you can use strcat as following:

strcat(mystr, lastString);
strcat(mystr, countString);

Another solution is to use STL String class or MFC CString if you are using Visual C++.

Bishoy
  • 705
  • 9
  • 24