C move memory parts inplace

Question

I am implementing several datastructures and one primitive I want to use is the following: I have a memory chunk A[N] (it has a variable length, but I take 100 for my examples) and inside this chunk, there is a smaller part C of length K (lets say 30) which I want to move without using any additional memory.

The additional difficulty is, that A "wraps", that is, C can start at A[80] and then the first 20 elements of C are the elements A[80..100] and the last 10 elements are the elements A[0..10]. Furthermore, the target range may also "wrap" and overlap with C in any possible way. Additionally, I don't want to use more than a constant amount of additional memory, everything should happen in place. Also, the part of A which is neither in the target range nor in the source range may contain something important, so it cannot be used either. So one case would be the following:

A looks like this:

|456789ABCDEF0123456789AB|-----|0123|

And should be transformed to this:

|89AB|-----|0123456789ABCDEF01234567|

Just delegating it to a library or use another datastructure from a library is not an option here, I want to understand the problem myself. On the first sight, I thought that it might not be trivial, but as soon as you distinguish a few cases, it becomes clear, but now I am having serious trouble. Of course there are the trivial cases if they don't overlap or don't wrap, but at least if both happens at the same time, it gets messy. You could start with one free place and move the part that belongs there, but then you create another free part somewhere else and it gets hard to keep track of which parts you can stil use.

Maybe I am missing something completely, but even my special case if the target range does not wrap has almost 100 lines (half of it are assertions and comments, though) and I could update it so that it also handles the general case with some additional index calculations, but if someone has an elegant and short solution, I would appreciate some help. Intuitively I think that this should somehow be trivial, but I just don't see the best solution yet.

Note: The interesting case is of course, if C is almost as big as A. If |C| < N/2, it is trivial.

edit: Using more than a constant amount of additional flags/indices counts as additional memory and I want to avoid that if possible.

edit: Some people wanted to see my code. My question is rather abstract, so I didn't want to post it, but maybe someone sees how to improve it. It is terrible, it only works for the case that the target starts at the beginning (however, that can easily be changed) and terribly long, but it does the job without additional memory in O(n).

#include <stddef.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>

void move_part(int* A, size_t N, size_t target, size_t source, size_t size, int show_steps)
{
  assert(source + size <= N);
  assert(target + size <= N);
  if (show_steps) {
    printf("Moving size %d from %d to %d.\n", size, source, target);
  }
  memmove(A + target, A + source, size * sizeof(int));
}

void swap_parts(int* A, size_t N, size_t first_begin, size_t second_begin, size_t size, int show_steps)
{
  if (show_steps) {
    printf("Swapping size %d at %d and %d.\n", size, first_begin, second_begin);
  }
  assert(first_begin + size <= N);
  assert(second_begin + size <= N);
  size_t i;
  for (i = 0; i < size; ++i) {
    int x = A[first_begin + i];
    A[first_begin + i] = A[second_begin + i];
    A[second_begin + i] = x;
  }
}

void move_to_beginning(int* A, size_t N, size_t begin, size_t size, int show_steps)
{
  assert(begin <= N);
  assert(size <= N);
  // Denotes the start of our "working range". Increases during
  // the algorithm and becomes N
  size_t part_start = 0;
  // Note: Keeping the size is crucial since begin == end could
  // mean that the range is empty or full.
  size_t end = (begin + size) % N;
  while (part_start != N) {
    size_t i;
    if (show_steps) {
      for (i = 0; i < N; ++i) {
    printf("%d ", A[i]);
      }
      printf("\n");
      printf("part_start %d  begin %d  end %d  size %d\n", part_start, begin, end, size);
    }
    // loop invariants
    assert(part_start < N);
    // The two pointers are in our range
    assert(part_start <= begin && begin <= N);
    assert(part_start <= end && end <= N);
    // size is valid (wrapped case, non-empty, non-full case)
    assert(begin <= end || (N - begin) + (end - part_start) == size);
    // size is valid (non wrapped case, non-empty, non-full case)
    assert(begin >= end || end - begin == size);
    // size is valid (working range is full or empty case)
    assert(begin != end || size == 0 || part_start + size == N);
    if (size == 0 || begin == N || begin == part_start) {
      // ##|1234|# -> 1234### ||
      if (show_steps) {
    printf("Case 1:\nTerminating\n");
      }
      // #||# -> ## ||
      // 12|##| -> 12## ||
      // |12|## -> 12## ||
      break;
      /* Not necessary any more, but would be the correct transformation:
     part_start = N;
     begin = N;
     end = N;
     size = 0;*/
    } else if (end == part_start) {
      // |##|123 -> ##|123|
      if (show_steps) {
    printf("Case 2:\n");
    printf("Setting end to %d.\n", N);
      }
      end = N;
    } else if (begin < end) {
      // ##|1234|# -> 1234### ||
      if (show_steps) {
    printf("Case 3:\n");
      }
      move_part(A, N, part_start, begin, size, show_steps);
      break;
      /* Not necessary any more, but would be the correct transformation:
     part_start = N;
     begin = N;
     end = N;
     size = 0;*/
    } else {
      size_t end_size = end - part_start;
      size_t begin_size = N - begin;
      assert(begin_size + end_size == size);
      if (end_size >= begin_size) {
    // 345|#|12 -> 12 5|#|34
    if (show_steps) {
      printf("Case 4:\n");
    }
    swap_parts(A, N, part_start, begin, begin_size, show_steps);
    assert(begin_size > 0); // Necessary for progress
    part_start += begin_size;
    size = end_size;
    // begin, end remain unchanged
      } else if (begin - part_start <= begin_size) {
    // 56|#|1234 -> 123 56|#|4
    size_t size_moved = begin - part_start;
    assert(size_moved >= end_size); // else the next step would be more efficient
    if (show_steps) {
      printf("Case 5\n");
    }
    swap_parts(A, N, part_start, begin, end_size, show_steps);
    move_part(A, N, end, begin + end_size, begin - end, show_steps);
    assert(end_size + (begin - end) == size_moved);
    size -= size_moved;
    part_start = begin;
    begin += size_moved;
    end += size_moved;
      } else if (end_size <= begin_size) {
    // 45|##|123 -> 123 #|45|# 
    if (show_steps) {
      printf("Case 6\n");
    }
    swap_parts(A, N, part_start, begin, end_size, show_steps);
    move_part(A, N, end, begin + end_size, begin_size - end_size, show_steps);
    part_start += begin_size;
    size = end_size;
    end = begin + end_size;
    // begin remains unchanged
      } else {
    // No case applies, this should never happen
    assert(0);
      }
    }
  }
}


int main()
{
  int N = 20;
  int A[20];
  size_t size = 17;
  size_t begin = 15;
  size_t i;
  for (i = 0; i < size; ++i) {
    A[(begin + i) % N] = i;
  }
  move_to_beginning(A, N, begin, size, 0);
  for (i = 0; i < size; ++i) {
    printf("%d ", A[i]);
  }
  printf("\n");
  return 0;
}

By "move", do you mean the equivalent of `memmove`, i.e. copy where source and dest ranges may overlap? Or are you trying to do a completely non-destructive operation where the data that would be clobbered is simultaneously moved to "old" part of the source range? — R.. GitHub STOP HELPING ICE, Sep 09 '12 at 15:53
@R - I think if the question is taken literally, it's the former: "_the part of A which is neither in the target range nor in the source range may contain something important_" — gcbenison, Sep 09 '12 at 16:15
@R.. Yes, but memmove does not allow wrapped ranges, but yes, that is what I am trying to achieve. — Lykos, Sep 09 '12 at 17:14
@Yahia I don't think my code is useful, it is terrible, complicated and long and my question is rather an abstract question than about my code. But just now, I am trying to simplify it (i.e. taking out specific code for my datastructures) and then I'll post it. — Lykos, Sep 09 '12 at 17:15
@CarlNorum Yes, I do. If you ask this, you probably mean that O(n^2) solution is trivial: You just move it by one Element several times until it is at the right position. Not necessarily n movements, but O(n) would be desirable. — Lykos, Sep 09 '12 at 17:33
To the OP: the question would be easier if the diagrams were of the form `XXXXXyyyZZZZZZZZZZ` instead of your `CC---CCCC` form, it would be possible for us to track where the individual elements need to go. — wildplasser, Sep 09 '12 at 17:46
Thanks. I'll look into it. (permutation groups is one of my pet peeves, like matrix tranposition) — wildplasser, Sep 09 '12 at 21:06
I added some tags that might help this question get more attention, including `language-agnostic`, since although you're working in C it's a very general algorithms question... well, at least one that applies to any language with a notion of "in-place". — R.. GitHub STOP HELPING ICE, Sep 10 '12 at 14:17
Isn't the problem just rotating a circular buffer? Or am I misunderstanding? — Ari, Sep 10 '12 at 16:57
It's more general. K=N would be the "rotating a circular buffer" case, and 2K<=N is trivial, but the remaining cases are ugly. — R.. GitHub STOP HELPING ICE, Sep 10 '12 at 18:02

Evgeny Kluev · Accepted Answer · 2012-09-18T16:53:33.753

Case 1: Source overlaps with destination at most in a single contiguous region, which is smaller than whole array

Detailed explanation of this case is given in the first answer by R.. I've nothing to add here.

Case 2: Either source overlaps with destination in two contiguous regions or we rotate whole array

The easiest approach would be always rotate whole array. This also moves some unneeded elements from destination range, but since in this case K > N/2, this does not make number of operations more then twice as necessary.

To rotate the array, use cycle leader algorithm: take first element of the array (A[0]) and copy it to destination position; previous contents of this position move again to its proper position; continue until some element is moved to the starting position.

Continue applying the cycle leader algorithm for next starting positions: A[1], A[2], ..., A[GCD(N,d) - 1], where d is the distance between source and destination.

After GCD(N,d) steps, all elements are on their proper positions. This works because:

Positions 0, 1, ..., GCD(N,d) - 1 belong to different cycles - because all these numbers are different (modulo GCD(N,d)).
Each cycle has length N / GCD(N,d) - because d / GCD(N,d) and N / GCD(N,d) are relatively prime.

This algorithm is simple and it moves each element exactly once. It may be made thread-safe (if we skip the write step unless inside the destination range). Other multi-threading-related advantage is that each element may have only two values - value before "move" and value after "move" (no temporary in-between values possible).

But it does not always have optimal performance. If element_size * GCD(N,d) is comparable to cache line size, we might take all GCD(N,d) starting positions and process them together. If this value is too large, we can split starting positions into several contiguous segments to lower space requirements back to O(1).

The problem is when element_size * GCD(N,d) is much smaller than cache line size. In this case we get a lot of cache misses and performance degrades. gusbro's idea to temporarily swap array pieces with some "swap" region (of size d) suggests more efficient algorithm for this case. It may be optimized more if we use "swap" region, that fits in the cache, and copy non-overlapped areas with memcpy.

One more algorithm. It does not overwrite elements that are not in the destination range. And it is cache-friendly. The only disadvantage is: it moves each element exactly twice.

The idea is to move two pointers in opposite directions and swap pointed elements. There is no problem with overlapping regions because overlapping regions are just reversed. After first pass of this algorithm, we have all source elements moved to destination range, but in reversed order. So second pass should reverse destination range:

for (d = dst_start, s = src_end - 1;
     d != dst_end;
     d = (d + 1) % N, s = (s + N - 1) % N)
  swap(s, d);

for (d = dst_start, s = dst_end - 1;
     d != dst_end;
     d = (d + 1) % N, s = (s + N - 1) % N)
  swap(s, d);

I'm not clear on how/why it's valid to rotate the whole array when K — R.. GitHub STOP HELPING ICE, Sep 11 '12 at 22:28
Hmm, perhaps that's easy to solve; you just skip the actual write step when the position is outside of the destination range. — R.. GitHub STOP HELPING ICE, Sep 11 '12 at 22:39
Ok, thanks. I had to think about 1. first, but obviously adding or subtracting d doesn't change the value modulo GCD(N,d) and then it becomes clear. @R.. It is valid to rotate the whole array for case 2. If source and target ranges overlap in two contiguous regions, then every element is either covered by the source range or by the destination range, hence no data is lost that would not be lost anyway in this case. — Lykos, Sep 12 '12 at 00:05
`memmove` does not clobber the source; the source (except possibly parts of the source overlapped with the destination) is considered read-only. If you want to be the analogue of `memmove` for a circular buffer, you can't write outside `dest`. — R.. GitHub STOP HELPING ICE, Sep 12 '12 at 00:32
Indeed, my solution for avoiding clobbering (just skipping the write step unless you're inside the destination range) works, so this answer gives a complete solution. I'll leave the bounty around a few days just to draw attention and hopefully get the good answers some upvotes, but short of a radically better answer appearing, I plan to award the bounty to your answer. — R.. GitHub STOP HELPING ICE, Sep 12 '12 at 01:30
Ok, you are right, I should not write outside dest, I just meant that for my application, it doesn't matter if you move everything. But thanks for your additional comment that one can just omit the write step. And thanks Evgeny for your additional comment about the caching, that was one thing I was still worried about. — Lykos, Sep 13 '12 at 22:44
So 2*K>N is not the proper test. For example, moving 60..99,0..50 by d=+2 won't alter 53..61. If you test that K+d — aka.nice, Sep 14 '12 at 22:13
@Lykos: There are a little bit more comments and other simple algorithm. — Evgeny Kluev, Sep 18 '12 at 16:58
Yes, I saw it, thanks, the last algorithm is a very nice idea, too and of course I am glad about the cache-friendlyness. — Lykos, Sep 18 '12 at 17:57

score 3 · Answer 2 · answered Sep 10 '12 at 02:45

3

This is not a complete answer yet, but I think it may be the right idea.

Start with an element of the source range and consider the destination position it will be mapped to. That position is either inside the source range, or outside it. If it's outside the source range, you can just copy, and you're done with that element. On the other hand, if it maps onto a destination position inside the source range, you can copy it, but you have to save the old value you're overwriting and perform the above process iteratively with this new element of the source.

Essentially, you're operating on the cycles of a permutation.

The problem is keeping track of what you've finished and what remains to be done. It's not immediately apparent if there's a way to do this without O(n) working space.

answered Sep 10 '12 at 02:45

R.. GitHub STOP HELPING ICE

208,859
35
376
711

Yes, thanks, I thought about something like that, too and that could yield a beautiful solution, if there if a way to guarantee, that I move every element exactly once. Maybe it even suffices if I just iterate through that part of the target range that is not covered by the source range and subsequently copy the element that belongs there to the free space and then to the newly created free space and so on, until I get outside the target range. – Lykos Sep 10 '12 at 09:02
Imagine the case where `K==N`. There is nothing outside the target range and the move is really a permutation that's some power of the permutation `(0 1 2 3 4 ... N-1)`. If the offset between source and dest is relatively prime to `N`, this power is an `N`-cycle, and you'll finish the whole move doing what I described without any need to track what's already been copied. This will always be the case if `N` is prime, of course. But in general, it will break into a product of disjoint orbits... The K – R.. GitHub STOP HELPING ICE Sep 10 '12 at 14:15
Ok, you are right, stupid of me, I have to take care of some special cases, but you can easily shift by a constant number as in the O(n^2) algorithm below and make your offset relatively prime to N. So only groups with fewer than N/constant generators are a problem or if they are very badly distributed, e.g if N = n! for some n and the offset is n/2, then no generator is close. But anyway, in my situation, I know that N is a power of two and that makes it a little easier here. – Lykos Sep 10 '12 at 19:37

gusbro · Answer 3 · 2012-09-18T18:21:45.090

3

This solution is O(N) and uses already processed source locations as scratch space to use when ranges overlap. It will swap contents of source and destination up to the point when it reaches the start of destination, then it will proceed copying from the scratch space generated before. The second loop restores the clobbered region after each character of the scratch space is used.

move(A,N, src_idx, dst_idx, len)
{
  first_dst_idx=dst_idx;
  first_src_idx=src_idx;
  mlen=0;
  while(src_idx != first_dst_idx && len > 0)
  {
    temp = A[dst_idx];
    A[dst_idx] = A[src_idx];
    A[src_idx] = temp;
    src_idx=(src_idx+1) mod N;
    dst_idx=(dst_idx+1) mod N;
    len--; mlen++;
  }
  src_idx = first_src_idx;
  while(len > 0)
  {
    A[dst_idx] = A[src_idx];
    A[src_idx] = A[first_dst_idx];
    src_idx=(src_idx+1) mod N;
    dst_idx=(dst_idx+1) mod N; 
    first_dst_idx=(first_dst_idx+1) mod N;
    len--;
  }
  while(mlen > 0)
  { // restore reamining scratch space
    A[src_idx] = A[first_dst_idx];
    src_idx=(src_idx+1) mod N;
    first_dst_idx=(first_dst_idx+1) mod N;
    mlen--;
  }
}

edited Sep 18 '12 at 18:21

answered Sep 11 '12 at 21:20

gusbro

22,357
35
46

The garbage could be fixed by copying that region back from the corresponding range of the destination at the end, right? I don't think OP needs this, but it might be needed in general. However, even if you can restore the wrongly-clobbered region of the source, the temporary clobbering makes this approach non-thread-safe if other threads are not expecting "source" to be clobbered. – R.. GitHub STOP HELPING ICE Sep 11 '12 at 22:38
@R.: Yes, you can restore the clobbered region in the second loop, after having used the scratch location justy copy back the original character. When regions overlap I think none of the algorithms posted in the question are thread safe... – gusbro Sep 12 '12 at 02:34
The accepted answer, with my proposed modification from the comments, is thread-safe and avoids overwriting anything outside of dest even temporarily. – R.. GitHub STOP HELPING ICE Sep 12 '12 at 02:39
@R.. This answer may be modified to avoid overwriting anything outside of dest (for most interesting case K > N/2). Just select the region to be swapped with other regions at the very beginning of destination area. Then swap any succeeding piece of the array with this region. Then shuffle parts of the "swap" region and copy the "unmodifiable" piece into it. One disadvantage is that most array elements are copied twice. Also this modification makes algorithm complicated. On the positive side, this algorithm is cache-friendly. – Evgeny Kluev Sep 12 '12 at 11:01
@R..: In the accepted answer, if a thread gets the chance to run in the middle of the algorithm, source data may be inconsistent... so I'd argue that it's not thread safe. – gusbro Sep 12 '12 at 13:21
How can it be inconsistent if you never write to it? I don't follow. – R.. GitHub STOP HELPING ICE Sep 12 '12 at 13:32
@R..: Because if the two ranges overlap there has to be a moment when you are copying over the source range but you didn't finish copying all overlapping characters. In that period of time the array is inconsistent (neither the source nor the destination have proper data) – gusbro Sep 12 '12 at 13:51
There are plenty of times where the destination range (and the part of the source that's being overwritten because it's also in the destination range) are incomplete. But no position ever has any value except its original value or its final value, and no position outside the destination range is ever written at all. – R.. GitHub STOP HELPING ICE Sep 12 '12 at 13:57
Thanks for this solution, sadly I can only accept one solution. But you should change o(N) to O(N) since o(N) has a different meaning. – Lykos Sep 18 '12 at 18:02

gcbenison · Answer 4 · 2012-09-09T16:40:52.297

** this only works if the length of C is <= half the length of A. But I'm leaving it up here in hopes of fixing it.**

** this solution will not preserve any of the contents of the target range, a behavior which I believe matches the wording of the original question **

;; A function that wraps an out-of-bounds index to its proper location.
mod'(i):
  return (i + length(A)) mod length(A)

;; shifts the range A[i]..A[i + n] to A[i - delta]..A[i - delta + n]
move_backward (i,delta,n):
   A[mod'(i - delta)] = A[mod'(i)]
   if (n > 0):
     move_backward (i + 1, delta, n - 1)

;; shifts the range A[i - n]..A[i] to A[i - n + delta]..A[i + delta]
move_forward (i, delta, n):
   A[mod'(i + delta)] = A[mod'(i)]
   if (n > 0):
      move_forward (i - 1, delta, n - 1)

shift_range (source_first, source_last, target_first):
   n = mod'(source_last - source_first)
   delta = mod'(target_first - source_first)

   if (delta > length(A) / 2):
       move_backward (source_first, length(A) - delta, n)
   else
       move_forward (source_last, delta, n)

Yes, the situation I have problems with is the case if the beginning of the target range is before the end of the source range which is equivalent to |C| <= |A|/2 as you say. But thanks anyway. — Lykos, Sep 09 '12 at 16:54

score 2 · Answer 5 · answered Sep 09 '12 at 18:35

OK, if it's like memmove but with a circular buffer, here's the way to do it:

Case 1: source/dest do not overlap. Just use memcpy, possibly breaking it up as needed where the buffer wraps.
Case 2: source/dest are equal. Do nothing.
Case 3: start of source lies strictly inside the dest region. Do a simple forward copy loop, for (i=0; i<k; i++) A[(dest+i)%N] = A[(src+i)%N];
Case 4: start of dest lies strictly inside the source region. Do a simple backward copy loop, for (i=K; i; i--) A[(dest+i-1)%N] = A[(src+i-1)%N];

Edit: This answer only works when K is at most N/2; otherwise it's possible that source and dest both start inside each other. I don't have an immediate fix, but it may be possible to choose a starting offset and direction that fix the issue...

Yes, but that is exactly the case I am interested in. The other cases are easy, but that is extremely hard. — Lykos, Sep 09 '12 at 19:26

Carl Norum · Answer 6 · 2012-09-09T17:58:59.790

0

Here's an O(n²) algorithm is pretty straightforward - just rotate the entire buffer a single byte, and then repeat that as many times as steps you want to:

void rotateBuffer(char *buffer, int size, int steps)
{
    char tmp;
    int i;

    for (i = 0; i < steps; i++)
    {
        tmp = buffer[size - 1];
        memmove(buffer + 1, buffer, size - 1);
        buffer[0] = tmp;
    }
}

It won't be fast, but its get the job done, and with only constant temporary storage.

Edit:

If you need to rotate just a sub-part of the buffer relative to a static underlying 'background', as discussed below in the comments, you can do something like this:

void rotateBuffer(int count, int start, int length)
{
    int i;
    int j;
    int index;

    // rotate 'count' bytes
    for (i = 0; i < count; i++)
    {
        // rotate by a single byte
        for (j = length - 1; j >= 0; j--)
        {
            index = start + i + j;
            buf[(index + 1) % SIZE] = buf[index % SIZE];
        }
    }
}

I think it might have a problem if you need to rotate the entire buffer, but in that case you could just fall back to the code above.

edited Sep 09 '12 at 17:58

answered Sep 09 '12 at 17:27

Carl Norum

219,201
40
422
469

The question is about moving/rotating a part of the buffer, while leaving the rest unchanged. – interjay Sep 09 '12 at 17:30
Like there are two buffers, one overlaid on the other, and we're just rotating one relative to the other? Does that even make sense? – Carl Norum Sep 09 '12 at 17:31
The way I understand the question, it's like `memmove` that works on a circular buffer. The parts which are not inside the destination range should not be changed. If the source range consisted of the entire buffer, then it would rotate it. – interjay Sep 09 '12 at 17:34
@interjay - OK. I made a version that rotates that way. – Carl Norum Sep 09 '12 at 17:54
That is how I meant it, however, I did know that there is an easy O(n^2) solution, but I am looking for an O(n) solution. – Lykos Sep 09 '12 at 18:19
@user1655480 : the question is not very clear. See my comment on the question. – wildplasser Sep 09 '12 at 18:25

C move memory parts inplace

6 Answers6

Case 1: Source overlaps with destination at most in a single contiguous region, which is smaller than whole array

Case 2: Either source overlaps with destination in two contiguous regions or we rotate whole array

Linked