(C) Recursive strcpy() that takes only 1 parameter

Question

Let me be clear from the get go, this is not a dupe, I'll explain how. So, I tasked myself to write a function that imitates strcpy but with 2 conditions:

it needs to be recursive
it must take a single parameter (which is the original string)

The function should return a pointer to the newly copied string. So this is what I've tried so far:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char * my_strcpy(char *original);

int main(void) {
  char *string = my_strcpy("alpine");
  printf("string = <%s>\n", string);
  return 0;
}

char * my_strcpy(char *original){
  char *string = (char *)malloc(10);
  if(*original == '\0') {
    return string;
  }
  *string++ = *original;
  my_strcpy(original + 1);
}

The problem is somewhat obvious, string gets malloc-ed every time my_strcpy() is called. One of the solutions I could think of would be to allocate memory for string only the first time the function is called. Since I'm allowed to have only 1 parameter, only thing I could think of was to check the call stack, but I don't know whether that's allowed and it does feel like cheating. Is there a logical solution to this problem?

If reentrancy is not a concern, you *could* use a global or static variable. However, why are you imposing such unrealistic constraints upon yourself? — NPE, Dec 14 '18 at 14:49
It is not necessary or recommended to cast the return of `malloc()` in C. — ryyker, Dec 14 '18 at 14:51
Why are you tasking yourself with such artificial tasks? There's nothing to learn from this except how to write horribly slow and dangerous programs. A better task would be to try to write a `strcpy` which beats a plain character-by-character copy in terms of performance. — Lundin, Dec 14 '18 at 14:51
_[strdup](https://www.geeksforgeeks.org/strdup-strdndup-functions-c/)_ takes only one argument. — ryyker, Dec 14 '18 at 14:55
@Lundin I know, but the point is to practice recursion. I prefer the iterative way of doing things as well, but I need to learn recursion somehow. Precisely, how do you declare a variable in a recursive function only the first time you call it, and then only update it with each next time the function is called. I think this could give me a better understanding of recursion. — Sergey Teryan, Dec 14 '18 at 14:55
regarding previous comment to Lundin, i.e. _Precisely, how do you declare a variable..._. you can use a `static` in the function. — ryyker, Dec 14 '18 at 14:57
@ryyker The idea is to re-implement these functions myself, I know there's a function for everything, but I'm training my logic. I'm aware of the casting debate, but since Ritchie said you have to cast it, then that's what I'm doing. — Sergey Teryan, Dec 14 '18 at 14:58
@SergeyTeryan - if Richie said that, then he said it a long time ago, and it is no longer applicable with current C standards. — ryyker, Dec 14 '18 at 15:00
@SergeyTeryan It is a myth that you need to learn and practice recursion. The classic mistake people make is that they mix up mathematical recursion with programmatic recursion, as if they were somehow equivalent. And so students spend way too much time on it, while the real life uses are extremely limited. The only case I know of where it is actually needed, is when you wish to size-optimize a BST at the cost of execution speed. Most often you want your BST to be fast though, and then you don't use recursion but a loop. — Lundin, Dec 14 '18 at 15:00
Saying "this is not a dupe" without contrasting against other questions is probably seen as a challenge to find one. Good. That is the most efficient way to find help to your question. — Yunnosch, Dec 14 '18 at 15:08
@SergeyTeryan "but since Ritchie said you have to cast it" Richie said that back in the Jurassic period. But we've had a C standard at least since the Cretaceous period. See https://stackoverflow.com/questions/32652213/why-does-the-book-say-i-must-cast-malloc. One popular theory is that the ISO 9899 standard was what killed the dinosaurs, causing a climate change where you could no longer get away with arbitrary, custom type systems. — Lundin, Dec 14 '18 at 15:10
Your function is only allowed one parameter...but is it able to call other functions? Why not have it as a wrapper around another function that takes 2 parameters and does the actual recursion? — Chris Turner, Dec 14 '18 at 15:16
the posted code is calling `malloc()` with every execution. However, Those MANY malloc'd memory areas are not linked together to form a coherent whole. Generally, recursion has some returned value for every execution. In general, every call to a 'malloc()` must (eventually) have a call to `free(). In any case, the posted code results in the compiler outputting the warning message: "untitled.c:20:1: warning: control reaches end of non-void function [-Wreturn-type]" which needs to be corrected — user3629249, Dec 14 '18 at 15:26
I think, this function can never work recursively with only one parameter. You can make it work for one single call. If you try to copy a second string or even call it from different threads you would completely mess up your result buffer. The function does not know when a new destination string would start or the current one needs to be prolonged. — Gerhardh, Dec 14 '18 at 16:06

Thomas Kammeyer · Answer 1 · 2018-12-14T21:43:21.750

You wrote it as tail recursive, but I think without making the function non-reentrant your only option is going to be to make the function head recursive and repeatedly call realloc on the return value of the recursive call to expand it, then add in one character. This has the same problem that just calling strlen to do the allocation has: it does something linear in the length of the input string in every recursive call and turns out to be an implicitly n-squared algorithm (0.5*n*(n+1)). You can improve it by making the amortized time complexity better, by expanding the string by a factor and only growing it when the existing buffer is full, but it's still not great.

There's a reason you wouldn't use recursion for this task (which you probably know): stack depth will be equal to input string length, and a whole stack frame pushed and a call instruction for every character copied is a lot of overhead. Even so, you wouldn't do it recursively with a single argument if you were really going to do it recursively: you'd make a single-argument function that declares some locals and calls a recursive function with multiple arguments.

Even with the realloc trick, it'll be difficult or impossible to count the characters in the original as you go so that you can call realloc appropriately, remembering that other stdlib "str*" functions are off limits because they'll likely make your whole function n-squared, which I assumed we were trying to avoid.

Ugly tricks like verifying that the string is as long as a pointer and replacing the first few characters with a pointer by memcpy could be used, making the base case for the recursion more complicated, but, um, yuck.

rici · Accepted Answer · 2018-12-15T03:39:26.127

Recursion is a technique for analysing problems. That is, you start with the problem and think about what the recursive structure of a solution might be. You don't start with a recursive structure and then attempt to shoe-horn your problem willy-nilly into it.

In other words, it's good to practice recursive analysis, but the task you have set yourself -- to force the solution to have the form of a one-parameter function -- is not the way to do that. If you start contemplating global or static variables, or extracting runtime context by breaking into the call stack, you have a pretty good hint that you have not yet found the appropriate recursive analysis.

That's not to say that there is not an elegant recursive solution to your problem. There is one, but before we get to it, we might want to abstract away a detail of the problem in order to provide some motivation.

Clearly, if we have a contiguous data structure already in memory, making a contiguous copy is not challenging. If we don't know how big it is, we can do two traverses: one to find its size, after which we can allocate the needed memory, and another one to do the copy. Both of those tasks are simple loops, which is one form of recursion.

The essential nature of a recursive solution is to think about how to step from a problem to a (slightly) simpler or smaller problem. Or, more commonly, a small number of smaller or simpler problems.

That's the nature of one of the most classic recursive problems: sorting a sequence of numbers. The basic structure: divide the sequence into two roughly equal parts; sort each part (the recursive step) and put the results back together so that the combination is sorted. That basic outline has at least two interesting (and very different) manifestations:

Divide the sequence arbitrarily into two almost equal parts either by putting alternate elements in alternate parts or by putting the first half in one part and the rest in the other part. (The first one will work nicely if we don't know in advance how big the sequence is.) To put the sorted parts together, we have to interleave ("merge") the. (This is mergesort).
Divide the sequence into two ranges by estimating the middle value and putting all smaller values into one part and all larger values into the other part. To put the sorted parts together, we just concatenate them. (This is quicksort.)

In both cases, we also need to use fact that a single-element sequence is (trivially) sorted so no more processing needs to be done. If we divide a sequence into two parts often enough, ensuring that neither part is empty, we must eventually reach a part containing one element. (If we manage to do the division accurately, that will happen quite soon.)

It's interesting to implement these two strategies using singly-linked lists, so that the lengths really are not easily known. Both can be implemented this way, and the implementations reveal something important about the nature of sorting.

But let's get back to the much simpler problem at hand, copying a sequence into a newly-allocated contiguous array. To make the problem more interesting, we won't assume that the sequence is already stored contiguously, nor that we can traverse it twice.

To start, we need to find the length of the sequence, which we can do by observing that an empty sequence has length zero and any other sequence has one more element than the subsequence starting after the first element (the "tail" of the sequence.)

Length(seq):
    If seq is empty, return 0.
    Else, return 1 + Length(Tail(seq))

Now, suppose we have allocated storage for the copy. Now, we can copy by observing that an empty sequence is fully copied, and any other sequence can be copied by placing the first element into the allocated storage and then cipying the tail of the sequence into the storage starting at the second position: (and this procedure logically takes two arguments)

Copy(destination, seq):
    If seq is not empty:
        Put Head(seq) into the location destination
        Call Copy (destination+1, Tail(seq))

But we can't just put those two procedures together, because that would traverse the sequence twice, which we said we couldn't do. So we need to somehow nest these algorithms.

To do that, we have to start by passing the accumulated length down through the recursion so that we can use it at to allocate the storage when we know how big the object. Then, on the way back, we need to copy the element we counted on the way down:

Copy(seq, length):
    If seq is not empty:
        Set item to its first element (that is, Head(seq))
        Set destination to Copy(Tail(seq), length + 1)
        Store item at location destination - 1
        Return destination - 1
    Otherwise: (seq is empty)
        Set destination to Allocate(length)
        # (see important note below)
        Return destination + length

To correctly start the recursion, we need to pass in 0 as the initial length. It's bad style to force the user to insert "magic numbers", so we would normally wrap the function with a single-argument driver:

Strdup(seq):
    Return Copy (seq, 0)

Important Note: if this were written in C using strings, we would need to NUL-terminate the copy. That means allocating length+1 bytes, rather than length, and then storing 0 at destination+length.

score 0 · Answer 3 · answered Dec 14 '18 at 17:28

You didn't say we couldn't use strcat. So here is logical (although somewhat useless) answer by using recursion to do nothing other than chop off the last character and adding it back on again.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char * my_strcpy(char *original);

int main(void) {
  char *string = my_strcpy("alpine");
  printf("string = <%s>\n", string);
  return 0;
}

char * my_strcpy(char *original){
  if(*original == '\0') {
    return original;
  }
  int len = strlen(original);
  char *string = (char *)malloc(len+1);
  char *result = (char *)malloc(len+1);
  string[0] = result[0] = '\0';

  strcat (string, original);
  len--;
  char store[2] = {string[len] , '\0'}; // save last char
  string[len] = '\0';                   // cut it off
  strcat (result, my_strcpy(string));

  strcat (result, store);               // add it back
  return result;
}

(C) Recursive strcpy() that takes only 1 parameter

3 Answers3