0

I am trying to transpose a two column data table. The dt is of something like this.

access.id | data
    1        0
    1        0
    2       150
   ...      ...

What I want to have is something like this.

access.id | V1 | V2
    1       0    0
    2      150  ...

I have tried different approaches but I can't seem to get the result I want.

test[, lapply(.SD, .fun=function(dt) { return (transpose(dt)) }) , by = access.id]

or

test[, lapply(.SD, .fun=function(dt) { return (transpose(dt[data])) }) , by = access.id]

or

test[, .SD[, transpose(data)] , by = access.id]

The above give me an error: "Error in transpose(dt) : l must be a list."

test[, lapply(.SD, .fun=function(dt) { return (transpose(dt$data)) }) , by = access.id]

This gives the error: ' Error in dt$data : $ operator is invalid for atomic vectors"

test[, .SD[, t(data)] , by = access.id]

This works but gives me the exact same data table.

When I tried substituting t(data) for sum(data), it worked and gave me the sum of the data columns. I am unsure why what I tried doesn't work.

So my question is, how do I transpose this data table?

IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
vic
  • 359
  • 4
  • 18

1 Answers1

3

You can add a row id and then dcast using that new column as the right-hand-side of the formula.

df[, rid := paste0('V', rowid(access.id))]

dcast(df, access.id ~ rid, value.var = 'data')
#    access.id  V1  V2
# 1:         1   0   0
# 2:         2 150 300

This also works if you do it all in one step

dcast(df, access.id ~ paste0('V', rowid(access.id)), value.var = 'data')

Explanation:

rowid is a "Convenience function for generating a unique row ids within each group". It takes a vector x and creates a vector that starts at 1 and increments by 1 each time it sees that same value of x again, and starts over at 1 when a new value of x is encountered.

df[, rid := paste0('V', rowid(access.id))][]
#    access.id data rid
# 1:         1    0  V1
# 2:         1    0  V2
# 3:         2  150  V1
# 4:         2  300  V2

If the x is not ordered the explanation above is a little lacking, but I'm honestly not sure how to explain that situation well. Maybe this example will help.

data.table(c('a', 'a', 'b', 'a', 'b', 'b', 'a'), 
      rowid(c('a', 'a', 'b', 'a', 'b', 'b', 'a')))
#    V1 V2
# 1:  a  1
# 2:  a  2
# 3:  b  1
# 4:  a  3
# 5:  b  2
# 6:  b  3
# 7:  a  4

Since you want each row of each group as its own column, you put this new rid variable on the right-hand-side of the dcast formula. dcast wants you to put your desired columns on the right and the corresponding rows on the left. (not the best explanation, see ?dcast.)

Data used

df <- fread('
access.id   data
    1        0
    1        0
    2       150
    2       300
')
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • I'm not sure I understand this part. "~ paste0('V', rowid(access.id)" I am getting the table that I want, but the data is not in the correct order. There's access.id and then V1, V10, V11, ..., V2, V20. Is there a way to correct that? – vic Apr 25 '19 at 21:20
  • I've added an explanation for `rowid`. Not sure what's going on with your data specifically though, if you post some example data I could take a look. – IceCreamToucan Apr 25 '19 at 21:26
  • The correct data appears in the correct columns. It's just that the columns appear at an unorthodox order. – vic Apr 25 '19 at 21:39
  • It seems that the names of the columns were ordered alphabetically. I changed ~ "~ paste0('V', rowid(access.id)" to "~ rowid(access.id)" and I they are ordered numerically. Also used dcast.data.table instead of dcast. Thanks for the help. – vic Apr 25 '19 at 22:01