R - data.table assign the name of a column that is the minium of a row as value to new column

Question

Thanks to both of you for suggesting elegant solutions! Both solutions worked for me, but only the melt() and back-join solution worked for a data.table with dates instead of numeric values.

EDIT

I implemented the proposed data.table solution through melting and joining back with the obtained results from Wimpel as his/her solution also works with dates stored in the date columns instead of the intial toy data that was all integer values.

I prefered the readability of Peace Wang's solution though using data.table assignments and IMO it is much clearer syntax than the melt() solution, however (at least for me), it does not work with columns of type date.

Benchmarking both solutions for numeric/integer data, saw the melt() solution as clear winner.

EDIT 2 To replicate the NA-values through conversion that I get if I implement the solution proposed by Peace Wang, see below for the corrected version of the input data.table.

I have sth like this: Image a list of patient records with measurements taken at various dates. The colnames of the date columns would be sth like "2020-12-15" / "2021-01-15" etc.

 ID   Date_1       Date_2      Date_3   
  1   1990-01-01   1990-02-01  1990-03-01      
  2   1990-01-01   1990-02-01  1990-03-01       
  3   1990-01-01   1982-02-01  1990-03-01

I have determined the mimum value of each row in my data.table dt like this:

dt <- dt[, Min := do.call(pmin, c(.SD, list(na.rm = TRUE))), .SDcols = -(1)]

So far so good. Now I want to add a new col Min_Date stating the corresponding col name (aka date in my example) of the found miniumum value per row to finally get sth lik this:

  ID   Date_1       Date_2      Date_3        Min        Min_Date
  1    1990-01-01   1990-02-01  1990-03-01   1990-01-01  Date_1
  2    1990-01-01   1990-02-01  1990-03-01   1990-01-01  Date_1
  3    1990-01-01   1982-02-01  1990-03-01   1982-02-01  Date_2

I tried variations of:

dt <- dt[, Min_Date := do.call(which.pmin, c(.SD, list(na.rm = TRUE))),
                           .SDcols = (2:4)]

and then trying to do sth with the col index. Don't really know my way around .I yet, but I couldn't make it work when used in sth along these lines:

exclusions.dt[exclusions.dt[, .I[which.min(.SD)], ISSUE_ID, .SDcols = (2:6)]$V1]

Would appreciate any pointer!

Does this answer your question? [For each row return the column name of the largest value](https://stackoverflow.com/questions/17735859/for-each-row-return-the-column-name-of-the-largest-value) — Peace Wang, Mar 01 '21 at 01:56
dt[,(date_min = colnames(.SD)[apply(.SD,1,which.min)]),.SDcols=!1] — Peace Wang, Mar 01 '21 at 01:59
Thanks for your suggestion. Just powered down my computer. I‘ll try it first things tomorrow morning and report back. — aimbotter21, Mar 01 '21 at 02:24
Understanding what the proposed does internally, I wonder why it is giving me repetitive warnings about ```NAs created through conversion``` ... — aimbotter21, Mar 01 '21 at 12:19
I don't get any warnings. What's the code? Maybe you can try to check each 'atom' part from a compound code. — Peace Wang, Mar 01 '21 at 13:10
While obviously ```dt[, colnames(.SD), .SDcols=!1]``` returns the names of the columns and I understand we apply row-wise the function ```which.min()``` that is supposed to return the corresponding column index so we can feed it into the .SD - subset data.table and extract the column name — aimbotter21, Mar 01 '21 at 13:15
However applying your suggested line of code to my specific example that only requires changing out the data.table name to sth else as the structure of the table is equal, I get with ```exclusions.dt[,(EXCL_REASON = colnames(.SD)[apply(.SD, 1, which.min)]), .SDcols = !1]``` the said error.... — aimbotter21, Mar 01 '21 at 13:17
I don't fully understand yet the placing of the square brackets and I tried a variation being: ```exclusions.dt[, EXCL_REASON := colnames(.SD[, apply(.SD, 1, which.min)]), .SDcols = !1]``` but the same ````generated NA``` warning. Thank you however for the great starting point!!! I will keep trying to figure out what is causing this on my end — aimbotter21, Mar 01 '21 at 13:20
One more question though: Would this theoretically still work with NAs in the original base data.table, so that some rows contain ```NAs``` in the date columns? I did also try a version with specified in the same line of code ```colnames(.SD)[apply(.SD, 1, which.min, na.rm = TRUE)])``` but that didn't run saying ```unused argument``` — aimbotter21, Mar 01 '21 at 13:23
```:=``` will create a new column on the data.table. ```=``` will extract sth as a columns( not change original data.table). — Peace Wang, Mar 01 '21 at 13:28
```dt[,.(date_min = colnames(.SD)[apply(.SD,1,which.min)]),.SDcols=!1]``` This will be better, ```.( )`` means ```list( )```. — Peace Wang, Mar 01 '21 at 13:31
For the ``NAs`` case, you can check this, https://stackoverflow.com/questions/22362561/how-to-preserve-na-values-when-using-which-min-with-apply/22363081. Or you can try to exclude the NAs with ```i``` in ```dt[i,j,by]```. — Peace Wang, Mar 01 '21 at 13:34
Great, I'll give it try. But, just to clarify - I actually do want to add a new column named ```date_min``` to the data.table ```dt``` and not just extract the information. I first find the minimum value and store it in the column ```Min``` that is added to the original ```dt``` and then subsequently I want to store the measurement date that corresponds to each ```IDs``` minimum measurement. This is why I used ```:=```. I thought this was the correct way of adding a newly computed column — aimbotter21, Mar 01 '21 at 13:37
I am sorry to say, but still this is giving me the same error/warning. I am bummed. I'll keep trying anyways! Thanks again — aimbotter21, Mar 01 '21 at 13:43

score 1 · Answer 1 · answered Mar 01 '21 at 14:02

1

The following code can work.

dt <- fread("
             ID   Date_1   Date_2   Date_3
  1    100      200      300
  2    100      500      300
  3    200      150      400
")
dt[, `:=`(Min = do.call(pmin, c(.SD, list(na.rm = TRUE))),
          date_min = colnames(.SD)[apply(.SD, 1, which.min)]
         ), 
   .SDcols = -1]

If you got the NAs warning, maybe you can try to refresh dt firstly.

answered Mar 01 '21 at 14:02

Peace Wang

2,399
1
8
15

Thanks again Peace Wang for your proposed solution. However, I do still get ```NAs generated through conversion warnings```. As my data.table is quite long and the continiously printed warning to the console slowed things down tremendously, I wrapped the solution code into ```suppressWarnings({ your code })``` and inspected the output. The first part works as expected, computing the ```Min``` and storing it in the ```Min``` column. However, the ```date_min``` column is filled with ```NAs``` only. – aimbotter21 Mar 01 '21 at 14:22
Do you get the warning after you copy and run my whole code? I suggest you to clean your variable environment with ```rm(list = ls())``` (expecially for **dt**). – Peace Wang Mar 01 '21 at 14:29
Huh, that's a bummer. Based on the toy data, all runs smoothly as expected.... I'll try again after clearing my environment – aimbotter21 Mar 01 '21 at 14:32
That makes sense. I guess your ```NAs generated through conversion warnings``` warning is due to your multi ```:=``` on ```dt``` or ```exclusions.dt```. The first ```:=``` operation should run everything as expected after clearing your environment. – Peace Wang Mar 01 '21 at 14:38
I edited the main question so that the data includes now real dates instead of toy integer values. With these values, you should also see ```NAs``` in the ```date_min``` column on your end – aimbotter21 Mar 01 '21 at 15:06
It looks now like this. ```fread("ID Date_1 Date_2 Date_3 1 1990-01-01 1990-02-01 1990-03-01 2 1990-01-01 1990-02-01 1990-03-01 1990-01-01 1982-02-01 990-03-01") ``` – aimbotter21 Mar 01 '21 at 15:07

Wimpel · Accepted Answer · 2021-03-01T18:59:11.683

1

Here is another data.table approach

#sample data
library( data.table )
DT <- fread("ID   Date_1   Date_2   Date_3   
  1    100      200      300      
  2    100      500      300      
  3    200      150      400    ")

#melt to long format and get rows with minimum value by ID
DT.min <- melt( DT, id.vars = "ID" )[ , .SD[ which.min(value) ], by = ID]
#    ID variable value
# 1:  1   Date_1   100
# 2:  2   Date_1   100
# 3:  3   Date_2   150

#join back to DT
DT[ DT.min, `:=`( Min = i.value, Min_Date = i.variable ), on = .(ID)][]
#    ID Date_1 Date_2 Date_3 Min Min_Date
# 1:  1    100    200    300 100   Date_1
# 2:  2    100    500    300 100   Date_1
# 3:  3    200    150    400 150   Date_2

edited Mar 01 '21 at 18:59

answered Mar 01 '21 at 14:53

Wimpel

26,031
1
20
37

Thanks for the suggestion. I'll give it a try. Just for clarification, where is the ```DT.lookup``` coming from? – aimbotter21 Mar 01 '21 at 15:05
Ah, it took a while to fully comprehend and implement this conversion in my function, but basically the ```DT.lookup``` is the computed ```DT.min``` in the prior step that we want as a ```lookup``` for the subsequent join as it contains all the wanted data. - Did I get this right? It works! Much obliged! – aimbotter21 Mar 01 '21 at 15:21
One follow-up question though. How would this solution behave if within a given row there are more than n>1 date that are equal to the minimum of a given row? Would it then just select the first matched (as seen from LHS to RHS) value's column name as ```Min_Date```? – aimbotter21 Mar 01 '21 at 15:41
Yes. And sorry for the naming mixup :) – Wimpel Mar 01 '21 at 18:02

R - data.table assign the name of a column that is the minium of a row as value to new column

2 Answers2