R Is Concurrent Modify to Large Data.Table or dataframe acceptable?

Question

Imagine a 12mm Data.Table.

I want to

create a new column and compute a new value using a complex algo using data only from the row.
break the 12mm rows into 1mm sections and have 1 cpu thread handle each of the 12x 1mm chunks (its a multi-core, 12 thread CPU).

This means 12 threads will iterate through their section of the data.table, read data from a row, compute a value and store it in a different column in the row.

Note:

the new column is being created before multi-threading
each thread is accessing a different part of the data.table by index
the function is too complex to do another way
right now the processing is sequentialthrough the data.table, only uses one CPU thread and runs for other 3 hours.

Q: What's a better way to increase "same machine" concurrency with 12mm rows using complex functions? Q: what's broken with this scenario?

Please [edit](https://stackoverflow.com/posts/49826865/edit) your question to include a [minimal reproducible example/attempt](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), including sample data. At the moment your questions are too broad for SO and lacking in specific coding details. — Maurits Evers, Apr 14 '18 at 01:38
(1) is likely covered with `apply(dx,1,...)`. (2) might be addressed with `expand.grid` or `outer`. Have you tried anything? (Q1) Add RAM. (Q2) Honestly, I'm not sure, because *everything you've offered is hypothetical.* — r2evans, Apr 14 '18 at 03:21
@r2evans It might not be wise to propose using `apply()` with a `data.frame` or `data.table` in general as the dataset will be coerced to `matrix` with *all* columns of the same type. See section 8.2.38 of Patrick Burn's [The R Inferno](https://www.burns-stat.com/pages/Tutor/R_inferno.pdf). — Uwe, Apr 14 '18 at 23:18
You are correct ... but in a vague comparison to Schrödinger's cat, "data table", "data frame", and similar structures are not necessarily `data.table` and `data.frame` until their true state is realized by actually seeing code/data structures. — r2evans, Apr 14 '18 at 23:27

R Is Concurrent Modify to Large Data.Table or dataframe acceptable?

0 Answers0