Fundamentally, columns are much easier to modify by-reference in R since columns are list elements, and list elements are not stored contiguously in memory.
Removing a column by reference just means unallocating its allotted memory and removing the associated pointers
By contrast, removing some rows is a lot harder and can't really be done by-reference -- some copying is inevitable. Consider this simplified representation of a table with two columns, A
and B
:
1 2 3 4 5
A: [ ][ ][ ][ ][ ]
B: [ ][ ][ ][ ][ ]
A
is stored in contiguous memory as an array with size 5*sizeof(A)
. E.g. if A
is an integer
, it's given 4 bytes per cell. numeric
is 8 bytes per cell.
Deleting B
is easy from a memory point of view: just tell R/your system you don't need that memory anymore:
1 2 3 4 5
A: [ ][ ][ ][ ][ ]
B: [x][x][x][x][x]
A
's memory allocation is unaffected.
By contrast, consider removing some rows from the table (i.e., both A
and B
):
1 2 3 4 5
A: [ ][x][x][ ][ ]
B: [ ][x][x][ ][ ]
If we simply release the memory for these 4 cells, our table will be broken -- its constituent memory has been split with the 2*sizeof(A)
-size gaps between its 1st and 4th rows.
The best we can do is to try and minimize copying by shifting rows 4 & 5, and leaving row 1 alone:
1 2 3<-4<-5
A: [ ][x][x][ ][ ]
B: [ ][x][x][ ][ ]
1 4 5
A: [ ][ ][ ]
B: [ ][ ][ ]
In the linked answer, Matt alludes to a very specific case in which the by-reference approach can work -- when the rows to add/drop come at the end. Hopefully the illustration makes it clear why this is easier to do.
This technical difficulty is the reason why the linked feature request is so hard to fill. Copying many columns' data as illustrated is easier said than done & requires a lot of finesse to get it working & communicated back to R from C properly.