Reshape data within groups - groups in a single row

Question

This is my input data :

 DeviceID   ContentID   Use
 D1 C1  0.678491346
 D1 C2  0.302147374
 D2 C1  0.695790066
 D2 C2  0.645849165
 D3 C1  0.83503997
 D3 C2  0.3622916

The expected output :

DeviceID    ContentID_1 Use_1   ContentID_2 Use_2
D1  C1  0.678491346 C2  0.302147374
D2  C1  0.695790066 C2  0.645849165
D3  C1  0.83503997  C2  0.3622916

I tried to reshape it using reshape2, but wasn't able to get it in the required format.

I tried :

 df %>% 
   group_by(DeviceID) %>% 
   mutate(rn = paste0("Content",row_number())) %>% 
   spread(rn, Use)

and

dcast(df,
      DeviceID~ContentID,
      value.var ="Use")

Any help would be appreciated!

`dcast(df,Device ID~Content ID, value.var ="Use")`. But I get the Content Id as columns. — Megha John, Jan 30 '18 at 09:16

Onyambu · Answer 1 · 2018-01-30T10:57:33.807

2

we can use Reduce and aggregate to rearrange it in the format given above

data.frame(Reduce(cbind,aggregate(.~Device,dat,I)))[c(1,2,4,3,5)]
  init V2          V4 V3          V5
1   D1 C1 0.678491346 C2 0.302147374
2   D2 C1 0.695790066 C2 0.645849165
3   D3 C1  0.83503997 C2   0.3622916

This is a brute force of library dplyr

 map2_dfc(s<-dat%>%spread(Id,Content),names(s),~rev(stack(s,.y)))
     ind values ind1   values1 ind2   values2
1 Device     D1   C1 0.6784913   C2 0.3021474
2 Device     D2   C1 0.6957901   C2 0.6458492
3 Device     D3   C1 0.8350400   C2 0.3622916

edited Jan 30 '18 at 10:57

answered Jan 30 '18 at 09:30

Onyambu

67,392
3
24
53

Thank you! Is there a way to do this using dplyr/reshape2? – Megha John Jan 30 '18 at 09:44
I cannot think of any right now. But if there is, one will post it here. – Onyambu Jan 30 '18 at 09:47
Could you explain `aggregate(.~Device,dat,I)`? – Megha John Jan 30 '18 at 09:57
`aggregate` takes in data and groups it with respect to the provided factor then performs the function on the grouped data. my function is the `I ` (identity) function. essentially just dividing the data but outputting that same data – Onyambu Jan 30 '18 at 10:10
eg try `aggregate(.~Species,iris,mean)` – Onyambu Jan 30 '18 at 10:13

Megha John · Accepted Answer · 2018-01-30T10:51:59.100

2

Posting my solution to this:

library(splitstackshape)
library(tidyverse)

df %>%
  group_by(DeviceId) %>%
  summarise_all(function(x) paste0(x, collapse = "_")) %>%
  cSplit(names(.)[-1], '_')

Including the suggestion from @AntoniosK in the comments.

edited Jan 30 '18 at 10:51

answered Jan 30 '18 at 10:18

Megha John

153
1
12

score 2 · Answer 3 · edited Jun 20 '20 at 09:12

As of v1.9.6 (on CRAN 19 Sep 2015), data.table can cast multiple value columns simultaneously:

library(data.table)
dcast(setDT(df), DeviceID ~ rowid(DeviceID), value.var = c("ContentID", "Use"))

    DeviceID  ContentID_1  ContentID_2     Use_1     Use_2
1:        D1           C1           C2 0.6784913 0.3021474
2:        D2           C1           C2 0.6957901 0.6458492
3:        D3           C1           C2 0.8350400 0.3622916

Changing column order

The result contains the expected columns but in a different order. dcast() creates the new columns grouped by value.var.

The OP has not indicated whether the exact column order is of importance or not. However, the expected result can be reproduced exactly by changing the column order by reference, i.e., without copying the whole data object using setcolorder():

cols <- c("ContentID", "Use")
wide <- dcast(setDT(df), DeviceID ~ rowid(DeviceID), value.var = cols)
new_col_order <- CJ(seq_len(uniqueN(df$ContentID)), cols)[, paste(V2, V1, sep = "_")]
setcolorder(wide, new_col_order)
wide

   ContentID_1     Use_1 ContentID_2     Use_2 DeviceID
1:          C1 0.6784913          C2 0.3021474       D1
2:          C1 0.6957901          C2 0.6458492       D2
3:          C1 0.8350400          C2 0.3622916       D3

CJ() is a cross join of the row ids with the value.vars to create the column names in the desired order.

I have filed a feature request on GitHub to optionally change the order of columns in dcast().

Data

library(data.table)
df <- fread(
  " DeviceID  ContentID  Use
 D1 C1  0.678491346
  D1 C2  0.302147374
  D2 C1  0.695790066
  D2 C2  0.645849165
  D3 C1  0.83503997
  D3 C2  0.3622916"
)

score 1 · Answer 4 · answered Jan 30 '18 at 10:04

1

df = read.table(text = "
DeviceId  ContentID  Use
D1 C1  0.678491346
D1 C2  0.302147374
D2 C1  0.695790066
D2 C2  0.645849165
D3 C1  0.83503997
D3 C2  0.3622916
", header=T, stringsAsFactors=F)

library(tidyverse)

df %>%
  group_by(DeviceId) %>%
  summarise_all(function(x) paste0(x, collapse = "_")) %>%
  separate(ContentID, c("ID_1","ID_2"), sep="_") %>%
  separate(Use, c("Use_1","Use_2"), sep="_")

# # A tibble: 3 x 5
#   DeviceId ID_1  ID_2  Use_1       Use_2      
# * <chr>    <chr> <chr> <chr>       <chr>      
# 1 D1       C1    C2    0.678491346 0.302147374
# 2 D2       C1    C2    0.695790066 0.645849165
# 3 D3       C1    C2    0.83503997  0.3622916

answered Jan 30 '18 at 10:04

AntoniosK

15,991
2
19
32

If I gave more than 2 categories? My actual data has n number of categories, is there a way to make the `into` argument in `separate` dynamic? – Megha John Jan 30 '18 at 10:11
I wish there was a `separate_all` function :-) You'll probably have to create your own function or use a different package. I'll have a look. – AntoniosK Jan 30 '18 at 10:13
1

`df %>% group_by(DeviceId) %>% summarise_all(function(x) paste0(x, collapse = "_")) %>% cSplit(c('ContentID', 'Use'), '_')` using `cSplit' from the `splitstackshape` package Thank you @AntoniosK for setting me in the right direction! – Megha John Jan 30 '18 at 10:14
1

Great! Or maybe `cSplit(names(.)[-1], '_')` if you want to automatically exclude first column, without typing all column names of interest... – AntoniosK Jan 30 '18 at 10:18
That is a neat trick! – Megha John Jan 30 '18 at 10:19
1

Also keep in mind that within `summarise_all` you can use the `funs()` argument, like this `summarise_all(funs(paste0(., collapse = "_")))`. This will give you the same result. However, `funs()` can be used to apply multiple functions to multiple columns, like this: `summarise_all(funs(a = paste0(., collapse = "_"), b = paste0(., collapse = "||")))` which might be really useful in a different case in the future. – AntoniosK Jan 30 '18 at 10:25

score 1 · Answer 5 · answered Jan 30 '18 at 10:23

library(data.table)
DT <- setDT(df)

Using your dcast intent, you could do

    Reduce(function(dtf1,dtf2) merge(dtf1,dtf2, by = "DeviceId"), 
           lapply( unique(DT$ContentID),
                   function(x){dcast(DT[ContentID == x],DeviceId + ContentID  ~ ContentID ,value.var = "Use")} ))

   DeviceId ContentID.x        C1 ContentID.y        C2
1:       D1          C1 0.6784913          C2 0.3021474
2:       D2          C1 0.6957901          C2 0.6458492
3:       D3          C1 0.8350400          C2 0.3622916

Reshape data within groups - groups in a single row

5 Answers5

Changing column order

Data