Modifying an array "in-place" in C#?

Question

There's a leetCode problem with a prompt like

Given an array 'nums' and a value 'val', remove all instances of that value in-place and return the new length.

Do not allocate extra space for another array, you must do this by modifying the input array in-place with O(1) extra memory.

The order of elements can be changed. It doesn't matter what you leave beyond the new length.

Example 1:

Given nums = [3,2,2,3], val = 3,

Your function should return length = 2, with the first two elements of nums being 2.

Does this problem make sense in C# (one of the supported languages for solving this problem on the site)? You can't modify the length of an array, only can allocate a new array.

"In place" means that you can't allocate a new array either. Keeping a separate Length variable is not uncommon, required in a language like C. It is the most efficient way to do it. [List<>.RemoveAll()](https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,82567b42bbfc416e) does it for you. — Hans Passant, Oct 13 '18 at 06:46

score 3 · Answer 1 · answered Oct 13 '18 at 03:59

3

Does this problem make sense in C#? Yes.

[In C#] You can't modify the length of an array, only can allocate a new array. Absolutely right. So re-read the question, and see what you can do. I don't want to give the answer, as clearly you will learn from working this out yourself. So stop reading here; read on if you want a hint.

Hint: focus on this part...

The order of elements can be changed. It doesn't matter what you leave beyond the new length.

answered Oct 13 '18 at 03:59

Richardissimo

5,596
2
18
36

I understand how to do an unstable partition to put the "removed" values in the back. But I don't see how that satisfies the problem, which specifically asks the values to be removed (i.e. not in the same array). – iterator Oct 13 '18 at 05:43
2

@iterator I think you are misunderstanding the question, it wants the same array, it doesn't care about whats at the end, it just cares about removing the given value from the start and up to the new length – TheGeneral Oct 13 '18 at 06:11

TheGeneral · Answer 2 · 2018-10-13T06:40:32.207

A totally nonsensical approach, using unsafe, fixed, and Pointers... Why? well why not. And besides, any leet code problem ought to use leet pointers in its solve.

Given

public static unsafe int Filter(this int[] array, int value, int length)
{
   fixed (int* pArray = array)
   {
      var pI = pArray;
      var pLen = pArray + length;
      for (var p = pArray; p < pLen; p++)
         if (*p != value)
         {
            *pI = *p;
            pI++;
         }
      return (int)(pI - pArray);
   }
}

Usage

var input = new[] {23,4,4,3,32};
var length = input.Filter(4, input.GetLength(0));

for (var i = 0; i < length; i++)
   Console.WriteLine(input[i]);

Output

23
3
32

Just FYI, a normal safe solution would look like this

var result = 0;
for (var i = 0; i < length; i++)
   if (array[i] != value) array[result++] = array[i];
return result;

Benchmarks

And just because i had my benchmarker open and morbidly curious, its always good to test optimizations as you just never know. Each test is run 10000 times on each scale and Garbage collected each run.

----------------------------------------------------------------------------
Mode             : Release (64Bit)
Test Framework   : .NET Framework 4.7.1 (CLR 4.0.30319.42000)
----------------------------------------------------------------------------
Operating System : Microsoft Windows 10 Pro
Version          : 10.0.17134
----------------------------------------------------------------------------
CPU Name         : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Description      : Intel64 Family 6 Model 42 Stepping 7
Cores (Threads)  : 4 (8)      : Architecture  : x64
Clock Speed      : 3401 MHz   : Bus Speed     : 100 MHz
L2Cache          : 1 MB       : L3Cache       : 8 MB
----------------------------------------------------------------------------

Test 1

--- Standard input ------------------------------------------------------------
| Value      |    Average |    Fastest |    Cycles | Garbage | Test |    Gain |
--- Scale 100 -------------------------------------------------- Time 9.902 ---
| Array      | 434.091 ns | 300.000 ns |   3.731 K | 0.000 B | Base |  0.00 % |
| UnsafeOpti | 445.116 ns | 300.000 ns |   3.662 K | 0.000 B | N/A  | -2.54 % |
| Unsafe     | 448.286 ns | 300.000 ns |   3.755 K | 0.000 B | N/A  | -3.27 % |
--- Scale 1,000 ----------------------------------------------- Time 10.161 ---
| UnsafeOpti |   1.421 µs | 900.000 ns |   7.221 K | 0.000 B | N/A  | 17.70 % |
| Array      |   1.727 µs |   1.200 µs |   8.230 K | 0.000 B | Base |  0.00 % |
| Unsafe     |   1.740 µs |   1.200 µs |   8.302 K | 0.000 B | N/A  | -0.78 % |
--- Scale 10,000 ---------------------------------------------- Time 10.571 ---
| UnsafeOpti |  10.910 µs |   9.306 µs |  39.518 K | 0.000 B | N/A  | 20.03 % |
| Array      |  13.643 µs |  12.007 µs |  48.849 K | 0.000 B | Base |  0.00 % |
| Unsafe     |  13.657 µs |  12.007 µs |  48.907 K | 0.000 B | N/A  | -0.10 % |
--- Scale 100,000 --------------------------------------------- Time 15.443 ---
| UnsafeOpti | 105.386 µs |  84.954 µs | 362.969 K | 0.000 B | N/A  | 19.93 % |
| Unsafe     | 130.150 µs | 110.771 µs | 447.383 K | 0.000 B | N/A  |  1.12 % |
| Array      | 131.621 µs | 113.773 µs | 452.262 K | 0.000 B | Base |  0.00 % |
--- Scale 1,000,000 ------------------------------------------- Time 22.183 ---
| UnsafeOpti |   1.556 ms |   1.029 ms |   5.209 M | 0.000 B | N/A  | 23.13 % |
| Unsafe     |   2.007 ms |   1.303 ms |   6.780 M | 0.000 B | N/A  |  0.85 % |
| Array      |   2.024 ms |   1.354 ms |   6.841 M | 0.000 B | Base |  0.00 % |
-------------------------------------------------------------------------------

The question was "Does this problem make sense in C#?", not "please solve the problem for me". — Richardissimo, Oct 13 '18 at 04:02
@Richardissimo and i solved both with the one function, yes it does make sense, and yes you can do it, and i did it. thanks for your concern though. — TheGeneral, Oct 13 '18 at 04:03
what’s the benefit of using an unsafe method with pointers in this scenario? And what do you have against curly braces? — Moho, Oct 13 '18 at 04:09
@Moho Because its *leet* of course on both counts :P Also more performant with pointers — TheGeneral, Oct 13 '18 at 04:11
performant* because pointers give you direct access to the contagious memory — TheGeneral, Oct 13 '18 at 04:13
No problem you may also like to edit 'aught' to be 'ought' in the answer. We should campaign for 'performant' to be put into the dictionary [reference](https://english.stackexchange.com/questions/38945/what-is-wrong-with-the-word-performant). — Richardissimo, Oct 13 '18 at 04:42
@Richardissimo [cambridge](https://dictionary.cambridge.org/dictionary/english/performant) gave it a shot though, also [oxford](https://en.oxforddictionaries.com/definition/performant) — TheGeneral, Oct 13 '18 at 04:48
To be so performance oriented but still using post-increment operators... using `result` alone in `*(pArray + result)` then adding a line with a pre-increment operator after is more performant, but you'd have to add some curly braces. — Moho, Oct 13 '18 at 05:51
@Moho i agree, also i doubt the clr would forgive `pArray + length` either, but i am happy with no braces — TheGeneral, Oct 13 '18 at 05:59
I like it. I guess the compiler optimizes out the inefficiency of the post-increment when it's pre-incremented value isn't used. — Moho, Oct 13 '18 at 06:41
By the way, if possible, you want to use `array.Length` in the loop in the safe version - it allows you to skip two of the three overflow checks, which should bring the solution a bit closer to the optimised unsafe version. — Luaan, Oct 13 '18 at 06:49
@thebenman its a specialized tool i wrote to do all manner of things i need. Though BenchmarkDotNet is probably where you want to be at — TheGeneral, Oct 13 '18 at 07:07

Moho · Answer 3 · 2018-10-14T15:33:44.740

Looks like you're looking for an explanation of the algorithm so I'll try to explain it.

You're "rewriting" the array in place. Think of having a cursor traversing the array reading the data, while simultaneously having a cursor to write values behind it. The "read" cursor in this case could be a foreach loop or a for loop with an indexer.

Test data:

[12,34,56,34,78]

and we want to remove 34, we start with both cursors at position 0 (i.e. values[0]) with the "newLength = 0" representing the length of the "new" array and thus where the "write" cursor currently resides:

[12,34,56,34,78]
 ^r
 ^w
newLength: 0

The first element read, 12, does not match, so we include that element in the "new" array by writing it to the position in the array signified by the current length of the "new" array (to start, the length is 0, so that's values[0]. In this case, we're writing the same value to that element, so nothing changes and we move the write cursor to the next position by incrementing the length of the "new" array

[12,34,56,34,78]
    ^r
    ^w
newLength: 1

Now the next element does match the value to remove so we don't want to include it in the new array. We do this by not increasing the length of the "new" array and leaving that write cursor where it is while the read cursor moves on to the next element:

[12,34,56,34,78]
       ^r
    ^w
newLength: 1

If this array was only two elements, we're done, and no values in the array have changed but the returned length is only 1. As we have more elements, let's continue to see what happens. We read 56, which does not match the value to remove, so we "write" it at the position specified by the new "length", after which we increment the length:

[12,56,56,34,78]
          ^r
       ^w
newLength: 2

Now we read a value that matches, so we skip writing it:

[12,56,56,34,78]
             ^r
       ^w
newLength: 2

And the final value doesn't match, so we write it at the position specified by "length" and increment the length:

[12,56,78,34,78]
                ^r
          ^w
newLength: 3

The "read" cursor now exceeds the length of the array so we're done. The new "length" value is now 3 since we "wrote" three values to the array. Using the array in conjunction with the newLength value functionally results in the "new" array [12,56,78].

Here's a safe implementation of @TheGeneral's functionally correct but unsafe implementation using pointers:

public static int DoStuffSafely( int[] values, int valueToRemove, int length )
{
    var newLength = 0;

    // ~13.5% slower than @TheGeneral's unsafe implementation
    foreach( var v in values )
    {
        // if the value matches the value to remove, skip it
        if( valueToRemove != v )
        {
            // newLength tracks the length of the "new" array
            // we'll set the value at that index
            values[ newLength ] = v;
            // then increase the length of the "new" array
            ++newLength;
            // I never use post-increment/decrement operators
            // it requires a temp memory alloc that we toss immediately
            // which is expensive for this simple loop, adds about 11% in execution time
        }
    }

    return newLength;
}

Edit: for @Richardissimo

                values[ newLength ] = v;
00007FFBC1AC8993  cmp         eax,r10d  
00007FFBC1AC8996  jae         00007FFBC1AC89B0  
00007FFBC1AC8998  movsxd      rsi,eax  
00007FFBC1AC899B  mov         dword ptr [r8+rsi*4+10h],r11d  
                ++newLength;
00007FFBC1AC89A0  inc         eax

                values[ newLength++ ] = v;
00007FFBC1AD8993  lea         esi,[rax+1]
00007FFBC1AD8996  cmp         eax,r10d  
00007FFBC1AD8999  jae         00007FFBC1AD89B3  
00007FFBC1AD899B  movsxd      rax,eax  
00007FFBC1AD899E  mov         dword ptr [r8+rax*4+10h],r11d  
00007FFBC1AD89A3  mov         eax,esi

Nice answer, now i better get back to what i should be doing — TheGeneral, Oct 13 '18 at 06:45
*"I never use post-increment/decrement... it requires a temp memory alloc"* - I remember this from old C/C++ compilers but it [doesn't apply to C#](https://stackoverflow.com/questions/467322/is-there-any-performance-difference-between-i-and-i-in-c). — Richardissimo, Oct 13 '18 at 19:25
@Richardissimo - as I stated in a comment in another answer, it does not appear to matter for uses when the pre-incremented/decremented value is not used (i.e. compiler optimizes it), but it certainly affects performance when it is used (i.e. `values[ newLength++ ]` is indeed slower than `values[ newLength ]; newLength++;`). I tested the above in both scenarios for the edge case (very few matches) and averaged about an 11% performance difference. Give it a try yourself! As a result, I suggest using `++i` instead of `i++` to get in the habit of never using the post-increment/decrement operators — Moho, Oct 14 '18 at 05:33
The places where I've seen it, it is as shown in the accepted answer to the question I linked to: they compile to the same IL, so I'm interested in what you're describing, since it implies that isn't the case. Perhaps you should consider posting your code and the IL it generates as an answer to that question. — Richardissimo, Oct 14 '18 at 14:07
I explained it; perhaps you should consider taking the above code and viewing the IL it generates if you doubt me and you're interested in learning the truth. post-inc/dec ops must create a temp variable, assign the current value to that temp value, perform the increment/decrement op, then return the value of the temp var... versus using the current var value in place then an increment/decrement op afterwards — Moho, Oct 14 '18 at 14:36

Modifying an array "in-place" in C#?

3 Answers3

Benchmarks