0

I have a data table which contains one million records and I try to create a new column based on month.idx:

dt[, new_col := get(paset0("month_",month.idx)]

and it only works for the first line.

Can anyone help me with this problem? thanks!

Data
    id month_1 month_2 month_3 month_4 month_5 month.idx
1:  x1       1       1       1       0       1         3
2:  x2       0       0       0       1       0         4
3:  x3       1       0       0       0       0         1
4:  x4       0       0       0       0       0         5
5:  x5       1       1       0       0       1         2
6:  x6       0       1       0       1       1         3
7:  x7       0       0       1       1       1         4
8:  x8       0       0       0       0       0         1
9:  x9       0       0       0       0       1         5

results:
    id month_1 month_2 month_3 month_4 month_5 month.idx new_col
1:  x1       1       1       1       0       1         3       1
2:  x2       0       0       0       1       0         4       0
3:  x3       1       0       0       0       0         1       0
4:  x4       0       0       0       0       0         5       0
5:  x5       1       1       0       0       1         2       0
6:  x6       0       1       0       1       1         3       0
7:  x7       0       0       1       1       1         4       1
8:  x8       0       0       0       0       0         1       0
9:  x9       0       0       0       0       1         5       0

expected:
    id month_1 month_2 month_3 month_4 month_5 month.idx new_col
1:  x1       1       1       1       0       1         3       1
2:  x2       0       0       0       1       0         4       1
3:  x3       1       0       0       0       0         1       1
4:  x4       0       0       0       0       0         5       0
5:  x5       1       1       0       0       1         2       1
6:  x6       0       1       0       1       1         3       0
7:  x7       0       0       1       1       1         4       0
8:  x8       0       0       0       0       0         1       0
9:  x9       0       0       0       0       1         5       1
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
WayneZhou
  • 1
  • 2
  • It's generally helpful to provide and test code that reproduces the issue. (In this case, we don't have code to create the data.table and you didn't test and notice the typo paset0) If interested, eg for your next question, there's some guidance here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 – Frank Mar 28 '19 at 15:14

1 Answers1

2

Here are 2 options:

1) Using get row by row taking in Frank’s comment:

DT[, new_col := get(paste0("month_", month.idx)), by= month.idx]

2) Melt and then join to do a lookup

DT[, variable := paste0("month_", month.idx)]
DT[melt(DT, id.vars="id", measure.vars=patterns("^month_")), 
    on=.(id, variable), new_col := value]

Speed is dependent on the number of rows & month columns that you have.

data:

DT <- fread("id month_1 month_2 month_3 month_4 month_5 month.idx
x1       1       1       1       0       1         3
x2       0       0       0       1       0         4
x3       1       0       0       0       0         1
x4       0       0       0       0       0         5
x5       1       1       0       0       1         2
x6       0       1       0       1       1         3
x7       0       0       1       1       1         4
x8       0       0       0       0       0         1")
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • 1
    Re speed, the first way might be faster if you group by month.idx instead of row (if there are many more rows than columns) – Frank Mar 28 '19 at 15:16