2

I would like to concatenate 2 strings in C or C++ without new memory allocation and copying. Is it possible?

Possible C code:

char* str1 = (char*)malloc(100);
char* str2 = (char*)malloc(50);
char* str3 = /* some code that concatenates these 2 strings
                without copying to occupy a continuous memory region */

Then, when I don't need them any more, I just do:

free(str1);
free(str2);

Or if possible, I would like to achieve the same in C++, using std::string or maybe char*, but using new and delete (possibly void operator delete ( void* ptr, std::size_t sz ) operator (C++14) on the str3).

There are a lot of questions about strings concatenation, but I haven't found one that asks the same.

rightaway717
  • 2,631
  • 3
  • 29
  • 43

4 Answers4

7

No, it is not possible

In C, malloc operations return blocks of memory that have no relationship to each other. But in C, strings must be a continuous array of bytes. So there is no way to extend str1 without copying, let alone concatenate.

For C++, perhaps ropes may be of interest: See this answer.

Ropes are allocated in chunks that do not have to be contiguous. This supports O(1) concatenation. However, the accessors make it appear as a single string of bytes. I'm certain that to convert ropes back to std::string or C style strings will take a copy however, but this is probably the closest to what you want.

Also, it is probably a premature optimization to worry about the costs of copying a few strings around. Unless you are moving lots of data, it won't matter

Community
  • 1
  • 1
Anders
  • 2,270
  • 11
  • 19
  • OP asks about concatenating strings without copying to continuous memory area. This is not possible in any language. – too honest for this site Dec 14 '15 at 16:37
  • 5
    @Olaf - did you read my answer: I said "No, it is not possible", sounds like an answer to me? Don't be rude. – Anders Dec 14 '15 at 16:39
  • @Olaf He didnt specify that the result of concatenation has to be a c-string so I disagree with you. A rope example was actually good. – Adrian Lis Dec 14 '15 at 16:39
  • 1
    Actually, it *is* possible. If you write a new class which behaves like a string, but actually stores references to multiple chunks internally. The original SGI STL had a class called `rope` like this. – Martin Bonner supports Monica Dec 14 '15 at 16:41
  • There is a way to (attempt) to extend allocated memory block, and your answer is wrong. – SergeyA Dec 14 '15 at 16:43
  • @SergeyA - realloc is a good thing to point out but it will still require a copy to move the data. – Anders Dec 14 '15 at 16:45
  • @MartinBonner but it would have to either assume that there can be only N number of chunks, which is not feasible solution or dynamically create nodes that are appended therefore this implies memory allocation that the op did not want. – Adrian Lis Dec 14 '15 at 16:46
  • @MartinBonner: That still would violate the "continous memory" constraint. – too honest for this site Dec 14 '15 at 16:46
  • @Anders: I am not being rude! Just disagreeing with you does not mean one is rude. – too honest for this site Dec 14 '15 at 16:47
  • @Olaf: I don't see a "continuous memory" constraint. (Relaxing that does rule out C of course.) – Martin Bonner supports Monica Dec 14 '15 at 16:48
  • @Anders, yes I deal with lots of data, and gprof told me I have to cut down on copying byte arrays. Good suggestion with ropes, but as I see, it will really copy the data when provided as a C string. So, bad luck for me. – rightaway717 Dec 14 '15 at 16:49
  • @MartinBonner: 1) A correct answer would have to handle both, C and C++ 2) From the question: "Or if possible, I would like to achieve the same in C++," – too honest for this site Dec 14 '15 at 16:50
  • @Olaf, I apologize - now I was being rude. However, I stand by my position that: the OP wanted to concatenate *strings*, the only way to have a contiguous "string" is to have something that appears contiguous but isn't. I'm new enough here that I probably don't understand, but I don't get the C vs. C++ problem. Yes, they are separate languages but they are *very* closely related. Is this so bad? I read the question something like: Can it be done in "C"? if not then in "C++". – Anders Dec 14 '15 at 16:51
  • @Olaf: That wasn't how I read the question. I read it as a "either C *or* C++" – Martin Bonner supports Monica Dec 14 '15 at 16:53
  • @Anders: A wax-apple might exactly look like an apple, but try eating it. At some lower level a quacking and swimming thing might not behave like a duck anymore. – too honest for this site Dec 14 '15 at 16:55
  • 1
    It is possible under some circumstances, on some systems. It requires precise data placement, physically swapping non-volatile memory chips and/or changing the address decoding. – Martin James Dec 14 '15 at 16:56
  • @rightaway717 Then the best solution is to over-allocate and bring the data in already to the right place something like r-nar's comment. Which may not be possible. As you have a real performance problem, it's best to think at a low level how the most efficient way to move those bytes from disk to memory and shuffle it around as your application dictates. Then find a way to map it to higher level constructs. STL and your own C++ templates can be a way to mask the complexity but keep the algorithm readable and keep performance. Good luck, sounds like it will not be easy. – Anders Dec 14 '15 at 16:57
  • @Olaf I think I understand - you see the term "C/C++" or the appearance of both tags to imply they are the same language (which they are not) or to be glossing-over that they are not. But many constructs are in common and some answers are dual-applicable - as long as we keep that firmly in mind that they are not the same language. – Anders Dec 14 '15 at 16:59
  • @Anders:; If you have read my comments, it should be clear that I'm absolutely **against** treating C and C++ as the same language - I'm not a noob! The asnwer for C is simply - you can't! For C++ it is - you can't, as given. This because OP does not give enough information Any way, the question is too broad and not helpful for C programmers at least as there is no satisfactory solution. – too honest for this site Dec 14 '15 at 17:04
  • @Olaf so I think I agree (I did not say you were a noob - I am to SO, not C or C++), I did not read it as treating them as the same language, just a two part question. My answer is in bold (for both languages: No) - the rest was intended as explanation. – Anders Dec 14 '15 at 17:12
  • 1
    I already removed teh DV. After the explanation of ropes, I can live with it. – too honest for this site Dec 14 '15 at 17:16
2

Text concatenation is possible by writing your own string data structure. Easier in C++ than C.

struct My_String
{
  std::vector<char *> text_fragments;
};

You would have to implement all the text manipulation and searching algorithms based on this data structure. Nothing in the C library could be applied to the My_String structure. The std::string in C++ would not be compatible.

One of the issues is how to handle text modification. If one of the text fragments is a constant literal (that can't be modified), it would need to be copied before it could be modified. But copying is against the requirements. :-(

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
1

A "string" in C is a an array of chars with a null char at the end. And an array is "a data structure that lets you store one or more elements consecutively in memory". GNU C reference

You cannot concatenate two arrays that are not in consecutive memory blocks without copying one of them. You can do it however without allocating new memory. E.g.

char* str1 = malloc(100);  // size 100 bytes, uninitialised
str1[0] = '\0';            // string length 0, size of str1 100
strcat(str1, "a");         // string length 1, size of str1 still 100
strcat(str1, "b");         // string length 2, size of str1 still 100

You could if you want retrieve chars of 2 strings as if they were one without copying or reallocating. Here is an example function to do that (simple example, don't use in production code)

char* str1 = (char*)malloc(100);
char* str2 = (char*)malloc(50);

char get_char(int i) {
    if (i > 0 && i < 100) {
        return str1[i];
    }
    if (i >= 100 && i < 150) {
        return str2[i-100];
    }
    return 0;
}

But in such a case you couldn't have a char* str3 to perform pointer arithmetic with and access all 150 chars.

Manos Nikolaidis
  • 21,608
  • 12
  • 74
  • 82
0

Tags C and C++ are contradictory. In C, I'd recommend exploring realloc. You can code something along following lines:

char* str = malloc(50);
str = realloc(ptr, 55);

If you are lucky, the realloc call will not reallocate new memory and just 'extened' the already allocated segment, but there is no guarantee for this. This way you at at least have a shot of avoiding reallocations of the string. You will still have to copy contents of the second string into neweley allocated memory.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • 7
    This is a comment, but not an answer – too honest for this site Dec 14 '15 at 16:37
  • @Olaf, how so? What do you want to see in an answer? – SergeyA Dec 14 '15 at 16:39
  • My god, 7 users agree with Olaf. What's wrong with my answer??? It would really be benefecial if downvoters would care to explain what's wrong with the answer. Helps everybody. – SergeyA Dec 14 '15 at 16:39
  • 2
    @SergeyA I don't see how this answers the question of how to concatenate two strings without copying. – eerorika Dec 14 '15 at 16:43
  • @SergeyA: A lot of people upvoted it back when the answer was just "Tags C and C++ are contradictory. In C, I'd recommend exploring `realloc`." Your first sentence was clearly a comment, and your second sentence was, *at best* a poor answer. – Nicol Bolas Dec 14 '15 at 16:46
  • `realloc` still might copy which violates constraints. – too honest for this site Dec 14 '15 at 16:48
  • @Olaf, and I mentioned that. but at least it attempts to do so. No idea what's wrong with that. – SergeyA Dec 14 '15 at 16:49
  • Well, I full agree with the first sentence. That should just have been a comment, not (part of) the answer. However, the potential `realloc` might copy violates the constraint. – too honest for this site Dec 14 '15 at 16:58
  • 1
    Realloc doesn't solve the problem regardless. If you have two strings, and realloc the first so it has more space, you still have to copy the second into that extra space. Even if I'm missing something, at the very least it's not clear that it answers the question, in its current state it just gives a useful FYI that might help with a real solution. That's why you're being down voted I guess (not by me). – Nir Friedman Dec 14 '15 at 16:59