0

Currently I have a data.table like this:

item    dummyvar  
4q7C0o     1         
2BrKY63    1         
3drUy6I    1         
G5ALtO    1000        
G5G859    1000
PAP589    2000

And using a defined function I find out that the rows where there is a significant change in the dummyvar are in a numeric vector called imbalance = 4 6. What I would like to do is to create a new column in my data table such that all the rows below the given numbers given by imbalance are in a given class, for ex. something like this:

item    dummyvar   Class
4q7C0o     1         1
2BrKY63    1         1
3drUy6I    1         1
G5ALtO    1000       2 
G5G859    1000       2
PAP589    2000       3
Oliver
  • 443
  • 4
  • 10
  • Okay, I gave an answer an realised that I did not understand your `imbalance`-part. Could you be more precise? I just numerated the groups, but don't know what you are trying to do. – Martin Gal Jun 01 '20 at 15:47
  • 1
    Perhaps `your_data_table[, Class := cumsum(.I %in% imbalance)]`? I'm assuming `imbalance` is a vector of the row numbers where you want `Class` to increment. (This version starts at 0, but you could stick a `+ 1` on it to start at `1`) – Gregor Thomas Jun 01 '20 at 15:51
  • Related / possible duplicate: [*How to create a consecutive index based on a grouping variable in a dataframe*](https://stackoverflow.com/q/6112803/2204410) – Jaap Jun 01 '20 at 16:00
  • @MartinGal, `imbalance` is a numeric vector where each number tells the row where the class should change, for ex. rows 1-3 should be class 1, and then rows 4-5 will be class 2 – Oliver Jun 01 '20 at 16:00
  • 1
    @Oliver Ah, I see. I removed my answer for being off-topic. – Martin Gal Jun 01 '20 at 16:02
  • 1
    @MartinGal on the contrary, thank you for your time for commenting and checking my problem. I appreciate it – Oliver Jun 01 '20 at 16:08
  • @GregorThomas Thank you, That is what I was looking for. I appreciate your help – Oliver Jun 01 '20 at 16:09

2 Answers2

1

If I understand correctly, the vector imbalance contains the row indices into the data.table at which a new Class should start.

Here is one possible solution (among many other, I suppose) using cut() on the row index .I:

dt[, Class := cut(.I, c(0, imbalance, Inf), labels = FALSE, right = FALSE)][]
      item dummyvar Class
1:  4q7C0o        1     1
2: 2BrKY63        1     1
3: 3drUy6I        1     1
4:  G5ALtO     1000     2
5:  G5G859     1000     2
6:  PAP589     2000     3
Uwe
  • 41,420
  • 11
  • 90
  • 134
1

Perhaps your_data_table[, Class := cumsum(.I %in% imbalance)]? I'm assuming imbalance is a vector of the row numbers where you want Class to increment. (This version starts at 0, but you could stick a + 1 on it to start at 1)

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294