Concatenating arrays in place

Question

I ran into an issue while implementing a circular buffer that must occasionally be aligned.

Say I have two arrays, leftArr and rightArr. I want to move the right array to byteArr and the left array to byteArr + the length of the right array. Both leftArr and rightArr are greater than byteArr, and rightArr is greater than leftArr. (this is not quite the same as rotating a circular buffer because the left array does not need to start at byteArr) Although the left and right arrays do not overlap, the combined array stored at byteArr may overlap with the current arrays, stored at leftArr and rightArr. All memory from byteArr to rightArr + rightArrLen can be safely written to. One possible implementation is:

void align(char* byteArr, char* leftArr, int leftArrLen, char* rightArr, int rightArrLen) {
  char *t = malloc(rightArrLen + leftArrLen);

  // form concatenated data
  memcpy(t, right, rightArrLen);
  memcpy(t + rightArrLen, left, leftArrLen);

  // now replace
  memcpy(byteArr, t, rightArrLen + leftArrLen);
  free(t);
}

However, I must accomplish this with constant memory complexity.

What I have so far looks like this:

void align(char* byteArr, char* leftArr, int leftArrLen, char* rightArr, int rightArrLen)
{
    // first I check to see if some combination of memmove and memcpy will suffice, if not:
    unsigned int lStart = leftArr - byteArr;
    unsigned int lEnd = lStart + leftArrLen;
    unsigned int rStart = rightArr - byteArr;
    unsigned int rEnd = rStart + rightArrLen;
    unsigned int lShift = rEnd - rStart - lStart;
    unsigned int rShift = -rStart;
    char temp1;
    char temp2;
    unsigned int nextIndex;
    bool alreadyMoved;

    // move the right array
    for( unsigned int i = 0; i < rEnd - rStart; i++ )
    {
        alreadyMoved = false;

        for( unsigned int j = i; j < rEnd - rStart; j-= rShift )
        {
            if(    lStart <= j + rStart - lShift
                && j + rStart - lShift < lEnd
                && lStart <= (j + rStart) % lShift
                && (j + rStart) % lShift < lEnd
                && (j + rStart) % lShift < i )
            {
                alreadyMoved = true;
            }
        }

        if(alreadyMoved)
        {
            // byte has already been moved
            continue;
        }

        nextIndex = i - rShift;
        temp1 = byteArr[nextIndex];
        while( rStart <= nextIndex && nextIndex < rEnd )
        {
            nextIndex += rShift;
            temp2 = byteArr[nextIndex];
            byteArr[nextIndex] = temp1;
            temp1 = temp2;
            while( lStart <= nextIndex && nextIndex < lEnd )
            {
                nextIndex += lShift;
                temp2 = byteArr[nextIndex];
                byteArr[nextIndex] = temp1;
                temp1 = temp2;
            }
            if( nextIndex <= i - rShift )
            {
                // byte has already been moved
                break;
            }
        }
    }

    // move the left array
    for( unsigned int i = lStart; i < lShift && i < lEnd; i++ )
    {
        if( i >= rEnd - rStart )
        {
            nextIndex = i + lShift;
            temp1 = byteArr[nextIndex];
            byteArr[nextIndex] = byteArr[i];
            while( nextIndex < lEnd )
            {
                nextIndex += lShift;
                temp2 = byteArr[nextIndex];
                byteArr[nextIndex] = temp1;
                temp1 = temp2;
            }
        }
    }
}

This code works in the case lStart = 0, lLength = 11, rStart = 26, rLength = 70 but fails in the case lStart = 0, lLength = 46, rStart = 47, rLength = 53. The solution that I can see is to add logic to determine when a byte from the right array has already been moved. While this would be possible for me to do, I was wondering if there's a simpler solution to this problem that runs with constant memory complexity and without extra reads and writes?

Here's a program to test an implementation:

bool testAlign(int lStart, int lLength, int rStart, int rLength)
{
    char* byteArr = (char*) malloc(100 * sizeof(char));
    char* leftArr = byteArr + lStart;
    char* rightArr = byteArr + rStart;
    for(int i = 0; i < rLength; i++)
    {
        rightArr[i] = i;
    }
    for(int i = 0; i < lLength; i++)
    {
        leftArr[i] = i + rLength;
    }
    align(byteArr, leftArr, lLength, rightArr, rLength);
    for(int i = 0; i < lLength + rLength; i++)
    {
        if(byteArr[i] != i) return false;
    }
    return true;
}

So `leftArr` and `rightArr` _always_ point _somewhere_ to the same memory block that begins with `byteArr`? — chux - Reinstate Monica, Oct 09 '17 at 20:48
I suggest you post an example of (small) data inputs and output that work as you expect, and another that does not. — Weather Vane, Oct 09 '17 at 20:51
Looks like code could begin with `void foo(char* byteArr, unsigned lStart, unsigned rStart, unsigned leftArrLen, unsigned rightArrLen);` — chux - Reinstate Monica, Oct 09 '17 at 20:53
In `unsigned int lEnd = lStart + leftArrLen;` the variable `leftArrLen` is *uninitialised*, ditto `rightArrLen`. — Weather Vane, Oct 09 '17 at 20:53
@WeatherVane I suspect the 1st 5 variables are assumed to be set - somehow - yet post remains unclear — chux - Reinstate Monica, Oct 09 '17 at 20:54
@chux then it is appropriate to make the stock comment: Please post the [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) that shows the problem. Show the input, the expected output, and the actual output as text *in the question*. — Weather Vane, Oct 09 '17 at 20:57
@chux well deduced (in your deleted answer), but there is danger when not knowing the length of `char* byteArr` - a missing function parameter. — Weather Vane, Oct 09 '17 at 21:16
@WeatherVane Although really `byteArr` doesn't have a length, it's really just "long enough" for my purposes. — yinnonsanders, Oct 09 '17 at 21:21
Can `left+left_length` be assume to be `<= right`? Else why call them left,right? — chux - Reinstate Monica, Oct 09 '17 at 21:29
@chux Sounds reasonable. I'm sure the solution involves modular arithmetic, I just don't know whether it has to be complicated or if there's a simple trick/library implementation. — yinnonsanders, Oct 10 '17 at 13:55

John Bollinger · Accepted Answer · 2017-10-10T15:07:39.940

Imagine dividing byteArr into regions (not necessarily to scale):

 X1    Left   X2  Right
|---|--------|---|------|

The X1 and X2 are gaps in byteArr before the start of the left array and between the two arrays. In the general case, any or all of those four regions may have zero length.

You can then proceed like this:

Start by partially or wholly filling in the leading space in byteArr
- If Left has zero length then move Right to the front (if necessary) via memmove(). Done.
- Else if X1 is the same length as the Right array or larger then move the right array into that space via memcpy() and, possibly, move up the left array to abut it via memmove(). Done.
- Else, move the lead portion of the Right array into that space, producing the below layout. If X1 had zero length then R1 also has zero length, X2' == X2, and R2 == Right.

       R1    Left     X2'  R2
      |---|--------|------|---|

There are now two alternatives
- If R2 is the same length as Left or longer, then swap Left with the initial portion of R2 to produce (still not to scale):

       R1'    X2''  Left    R2'
      |------|-----|-------|--|

Otherwise, swap the initial portion of Left with all of R2 to produce (still not to scale):

       R1'    L2  X2''    L1
      |------|---|-------|----|

Now recognize that in either case, you have a strictly smaller problem of the same form as the original, where the new byteArr is the tail of the original starting immediately after region R1'. In the first case the new leftArr is the (final) Left region and the new rightArr is region R2'. In the other case, the new leftArr is region L2, and the new rightArr is region L1. Reset parameters to reflect this new problem, and loop back to step (1).

Note that I say to loop back to step 1. You could, of course, implement this algorithm (tail-)recursively, but then to achieve constant space usage you would need to rely on your compiler to optimize out the tail recursion, which otherwise consumes auxiliary space proportional to the length ratio of the larger of the two sub-arrays to the smaller.

Definitely works in constant space, but the worst case time complexity seems like it would be much worse than simply rotating after step 1 (I'm not sure of this), which I believe would use constant space. — yinnonsanders, Oct 10 '17 at 15:16
Ah my bad, it's `O(n + m)` because each swap/move involves moving an element to the correct position. — yinnonsanders, Oct 10 '17 at 15:36
I agree with your analysis. In fact, if I take n as the combined size of the left and right arrays, I contend that my algorithm's amount of work is bounded by O(n), and that replacing steps 2 and 3 with a rotation does not afford a tighter bound. I'm inclined to agree that the rotation option probably has a lower coefficient, but I note that the two alternatives are in fact identical in the special case that the left and right arrays have the same length and there are no gaps. — John Bollinger, Oct 10 '17 at 17:20
Depending on the cache, swapping could be much faster than rotating because of locality. — yinnonsanders, Oct 10 '17 at 17:23
If you wish to do that then I do not object. But your rep is not high enough to edit without review, and the reviewers have a good chance of rejecting such an edit. — John Bollinger, Oct 10 '17 at 17:41

Concatenating arrays in place

1 Answers1