The fastest way to do this I could think of is to pre-process a list of positions to group together positions in the same column, and then update column-by-column with Part
. This uses the fact that your array is rectangular (not ragged). Here is the code:
ClearAll[updateByColumn];
SetAttributes[updateByColumn, HoldFirst];
updateByColumn[l_, positions_, updateFunc_, updateFuncListable : (True | False) : False] :=
MapThread[
(l[[##]] = If[updateFuncListable, updateFunc@l[[##]], updateFunc /@ l[[##]]]) &,
{#[[All, 1, 1]], #[[All, All, 2]]} &@GatherBy[positions, First]];
EDIT
This assumes that the updating does not depend on the previously updated values. If it does, one can write a more elaborate version of this code which would take that into account but will perhaps be somewhat slower.
END EDIT
Here is a small test example to see how it works:
randomString[] := FromCharacterCode@RandomInteger[{97, 122}, 5];
In[131]:=
len = 10;
poslen = 10;
n = 1;
m = 1;
tst =
Table[{
Sequence @@ RandomInteger[10000, n],
Sequence @@ Table[randomString[], {m}],
Sequence @@ RandomReal[10000, n]}, {len}
]
testPositions =
Table[{RandomInteger[{1, Length[tst]}],RandomInteger[{1, Length@First@tst}]},
{len}]
Out[135]= {{320, "iwuwy", 3082.4}, {3108, "utuwf", 4339.14}, {5799, "dzjht", 8650.81},
{3177, "biyyl", 6239.64}, {7772, "bfawf", 6704.02}, {1679, "lrbro", 1873.57},
{9866, "gtprg", 4157.83}, {9720, "mtdnx", 4379.48}, {5399, "oxlhh", 2734.21},
{4409, "dbnlx", 955.428}}
Out[136]= {{1, 2}, {4, 1}, {3, 2}, {7, 2}, {8, 1}, {5, 2}, {2, 2},
{7, 2}, {2, 2}, {6, 2}}
Here we call the function:
In[137]:=
updateByColumn[tst, testPositions, f];
tst
Out[138]= {{320, f["iwuwy"], 3082.4}, {3108, f["utuwf"], 4339.14},
{5799, f["dzjht"], 8650.81}, {f[3177], "biyyl" 6239.64}, {7772, f["bfawf"], 6704.02},
{1679, f["lrbro"], 1873.57}, {9866, f["gtprg"], 4157.83}, {f[9720], "mtdnx", 4379.48},
{5399, "oxlhh", 2734.21}, {4409, "dbnlx", 955.428}}
Note that, since the function is HoldFirst
, the original array is modified, which allows us to save on the memory that would be needed for the copy.
Now, generating the large sample with the same code as above, but with these values of parameters: len = 100000; poslen = 50000; n = 100; m = 100;
, the call updateByColumn[tst,testPositions, f];
runs in 0.15 s. on my machine, and that's without parallelization. If your updating function updateFunc
is Listable
and that makes is much faster, you can set the optional third parameter to True
to make it run potentially even faster.
You can employ more tricks to save on time/memory consumption. For example, if you know that certain columns of your original large array are filled only with certain packable numeric type (Integers, Reals or Complex), you can Map Developer`ToPackedArray
on these specific columns, to significantly reduce the memory occupied by your array. The code to pack the array would be:
tstPacked = Table[0, {Length[tst]}];
Do[tstPacked [[i]] = Developer`ToPackedArray[tst[[All, i]]], {i, Length@First@tst}];
If, e.g., you produced tst
with the above code and parameters len = 100000;poslen = 50000;n = 100;m = 10;
, applying ByteCount
gives 700800040
bytes for the array tst
, but only 182028872
bytes for tstPacked
(note that an attempt to Transpose
, then Map
Developer`ToPackedArray
, and then Transpose
again will fail, since the second Transpose
would unpack all the columns). Note also that the columns will remain packed only if your updateFunc
function produces the values of the same types as the original column elements, for each column type.
On top of this, you probably can change MapThread
to some code using say ParallelMap
, to leverage parallel capabilities.
I am a bit worried about your described dimensions of the full array. Your full array might not fit to memory - but I guess, that is another problem.