0

Imagine a 12mm Data.Table.

I want to

  1. create a new column and compute a new value using a complex algo using data only from the row.

  2. break the 12mm rows into 1mm sections and have 1 cpu thread handle each of the 12x 1mm chunks (its a multi-core, 12 thread CPU).

This means 12 threads will iterate through their section of the data.table, read data from a row, compute a value and store it in a different column in the row.

Note:

  • the new column is being created before multi-threading
  • each thread is accessing a different part of the data.table by index
  • the function is too complex to do another way
  • right now the processing is sequentialthrough the data.table, only uses one CPU thread and runs for other 3 hours.

Q: What's a better way to increase "same machine" concurrency with 12mm rows using complex functions? Q: what's broken with this scenario?

eAndy
  • 323
  • 2
  • 9
  • 3
    *"Imagine a 12mm Data.Table."* That's quite thin. – Maurits Evers Apr 14 '18 at 01:36
  • Please [edit](https://stackoverflow.com/posts/49826865/edit) your question to include a [minimal reproducible example/attempt](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), including sample data. At the moment your questions are too broad for SO and lacking in specific coding details. – Maurits Evers Apr 14 '18 at 01:38
  • (1) is likely covered with `apply(dx,1,...)`. (2) might be addressed with `expand.grid` or `outer`. Have you tried anything? (Q1) Add RAM. (Q2) Honestly, I'm not sure, because *everything you've offered is hypothetical.* – r2evans Apr 14 '18 at 03:21
  • 1
    @r2evans It might not be wise to propose using `apply()` with a `data.frame` or `data.table` in general as the dataset will be coerced to `matrix` with *all* columns of the same type. See section 8.2.38 of Patrick Burn's [The R Inferno](https://www.burns-stat.com/pages/Tutor/R_inferno.pdf). – Uwe Apr 14 '18 at 23:18
  • You are correct ... but in a vague comparison to Schrödinger's cat, "data table", "data frame", and similar structures are not necessarily `data.table` and `data.frame` until their true state is realized by actually seeing code/data structures. – r2evans Apr 14 '18 at 23:27

0 Answers0