4

This maybe a silly question. But I want to insert a role before each group of data. All I can find online is how to insert a row after each group. Also, the inserted row will have a sequence number represent each ID. For example, i have a data table like

df 
ID     TIME   VAR VALUE
101     07/02   V1     9
101     07/03   V2     NA
101     07/03   V3     10
221     06/01   V1     2
221     07/03   V2     4

I want something like:

ID     TIME   VAR     VALUE
101       NA   sequence   1
101     07/02   V1        9
101     07/03   V2        NA
101     07/03   V3        10
221       NA   sequence   2
221     06/01   V1        2
221     07/03   V2        4

The real table have around 14 millions rows with 14,000 unique ID. Is there a quick way to update the table? Thank you!

VeraShao
  • 63
  • 8
  • 3
    `df[, rbind(.(write_your_tuple_here), .SD), by=ID]` ? There is no way to update a table to add/remove rows without creating a new table yet. https://stackoverflow.com/q/10790204/ In your case, it looks like you have used `melt` on a wide table.. if so, it'd probably be easier to add the extra column in that format before melting. – Frank Aug 03 '17 at 15:07
  • Hi Frank, the first option is not working, or I may put the wrong thing inside (write_your_tuple_here). For the melt and extra column, I am not sure how it can be down. Do you might explain it a little bit more? Thank you! – VeraShao Aug 03 '17 at 15:17
  • 1
    Hm, not sure. Is it the case that every ID has one row for V1, one for V2, one for V3? I meant `.(TIME = NA_character_, VAR = "sequence", VALUE = .GRP)` as your tuple, or whatever the rule is for the inserted row. (.GRP is sequence number, see `?.GRP`.) – Frank Aug 03 '17 at 15:20
  • No, it is not that case that every ID will only have one V1,V2,and V3. There may be multiple of them, as well as V4,V5... I am trying to modify the tuple, no success yet. But it is nice to know the .GRP function. – VeraShao Aug 03 '17 at 15:30
  • Ok. Fwiw, here's an example where the `rbind` approach works: `DT = data.table(ID= 1:2, v = 3:4, v2 = 5:6); DT[, rbind(.(v = 0, v2 = .GRP), .SD), by=ID]` – Frank Aug 03 '17 at 15:33
  • @Frank, It works now. I modify it a little bit. The .GRP is a amazing function. Thank you so much! – VeraShao Aug 03 '17 at 15:41
  • Np, glad it worked out. Feel free to post an answer with whatever worked for you. – Frank Aug 03 '17 at 15:44
  • @Frank. Another small question. If I want to change the TIME from NA to the earliest time minus one day. How can I do it? I am trying using TIME= df$TIME[1] -1. But It seems not working. – VeraShao Aug 03 '17 at 16:31
  • You shouldn't need to use `df$` inside a data.table. In this case, I guess, `first(TIME) - 1`. Anyways, you'll want to read the vignettes soon to get a handle on syntax like this, I guess. – Frank Aug 03 '17 at 16:37

1 Answers1

2

Might be faster as it does the rbind in bulk:

> tbl[, .SD
      ][, .N, ID
      ][, VALUE := .I
      ][, VAR := 'sequence'
      ][, N := NULL
      ][, rbind(.SD, tbl, fill=T)
      ][order(ID, VAR != 'sequence', TIME)
      ][, .(ID, TIME, VAR, VALUE)
      ]

    ID  TIME      VAR VALUE
1: 101    NA sequence     1
2: 101 07/02       V1     9
3: 101 07/03       V2    NA
4: 101 07/03       V3    10
5: 221    NA sequence     2
6: 221 06/01       V1     2
7: 221 07/03       V2     4
> 
Clayton Stanley
  • 7,513
  • 9
  • 32
  • 46