2

This is my input data :

 DeviceID   ContentID   Use
 D1 C1  0.678491346
 D1 C2  0.302147374
 D2 C1  0.695790066
 D2 C2  0.645849165
 D3 C1  0.83503997
 D3 C2  0.3622916

The expected output :

DeviceID    ContentID_1 Use_1   ContentID_2 Use_2
D1  C1  0.678491346 C2  0.302147374
D2  C1  0.695790066 C2  0.645849165
D3  C1  0.83503997  C2  0.3622916

I tried to reshape it using reshape2, but wasn't able to get it in the required format.

I tried :

 df %>% 
   group_by(DeviceID) %>% 
   mutate(rn = paste0("Content",row_number())) %>% 
   spread(rn, Use)

and

dcast(df,
      DeviceID~ContentID,
      value.var ="Use")

Any help would be appreciated!

Megha John
  • 153
  • 1
  • 12

5 Answers5

2

we can use Reduce and aggregate to rearrange it in the format given above

data.frame(Reduce(cbind,aggregate(.~Device,dat,I)))[c(1,2,4,3,5)]
  init V2          V4 V3          V5
1   D1 C1 0.678491346 C2 0.302147374
2   D2 C1 0.695790066 C2 0.645849165
3   D3 C1  0.83503997 C2   0.3622916

This is a brute force of library dplyr

 map2_dfc(s<-dat%>%spread(Id,Content),names(s),~rev(stack(s,.y)))
     ind values ind1   values1 ind2   values2
1 Device     D1   C1 0.6784913   C2 0.3021474
2 Device     D2   C1 0.6957901   C2 0.6458492
3 Device     D3   C1 0.8350400   C2 0.3622916
Onyambu
  • 67,392
  • 3
  • 24
  • 53
2

Posting my solution to this:

library(splitstackshape)
library(tidyverse)

df %>%
  group_by(DeviceId) %>%
  summarise_all(function(x) paste0(x, collapse = "_")) %>%
  cSplit(names(.)[-1], '_')

Including the suggestion from @AntoniosK in the comments.

Megha John
  • 153
  • 1
  • 12
2

As of v1.9.6 (on CRAN 19 Sep 2015), data.table can cast multiple value columns simultaneously:

library(data.table)
dcast(setDT(df), DeviceID ~ rowid(DeviceID), value.var = c("ContentID", "Use"))
    DeviceID  ContentID_1  ContentID_2     Use_1     Use_2
1:        D1           C1           C2 0.6784913 0.3021474
2:        D2           C1           C2 0.6957901 0.6458492
3:        D3           C1           C2 0.8350400 0.3622916

Changing column order

The result contains the expected columns but in a different order. dcast() creates the new columns grouped by value.var.

The OP has not indicated whether the exact column order is of importance or not. However, the expected result can be reproduced exactly by changing the column order by reference, i.e., without copying the whole data object using setcolorder():

cols <- c("ContentID", "Use")
wide <- dcast(setDT(df), DeviceID ~ rowid(DeviceID), value.var = cols)
new_col_order <- CJ(seq_len(uniqueN(df$ContentID)), cols)[, paste(V2, V1, sep = "_")]
setcolorder(wide, new_col_order)
wide
   ContentID_1     Use_1 ContentID_2     Use_2 DeviceID
1:          C1 0.6784913          C2 0.3021474       D1
2:          C1 0.6957901          C2 0.6458492       D2
3:          C1 0.8350400          C2 0.3622916       D3

CJ() is a cross join of the row ids with the value.vars to create the column names in the desired order.

I have filed a feature request on GitHub to optionally change the order of columns in dcast().

Data

library(data.table)
df <- fread(
  " DeviceID  ContentID  Use
 D1 C1  0.678491346
  D1 C2  0.302147374
  D2 C1  0.695790066
  D2 C2  0.645849165
  D3 C1  0.83503997
  D3 C2  0.3622916"
)
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
1
df = read.table(text = "
DeviceId  ContentID  Use
D1 C1  0.678491346
D1 C2  0.302147374
D2 C1  0.695790066
D2 C2  0.645849165
D3 C1  0.83503997
D3 C2  0.3622916
", header=T, stringsAsFactors=F)

library(tidyverse)

df %>%
  group_by(DeviceId) %>%
  summarise_all(function(x) paste0(x, collapse = "_")) %>%
  separate(ContentID, c("ID_1","ID_2"), sep="_") %>%
  separate(Use, c("Use_1","Use_2"), sep="_")

# # A tibble: 3 x 5
#   DeviceId ID_1  ID_2  Use_1       Use_2      
# * <chr>    <chr> <chr> <chr>       <chr>      
# 1 D1       C1    C2    0.678491346 0.302147374
# 2 D2       C1    C2    0.695790066 0.645849165
# 3 D3       C1    C2    0.83503997  0.3622916 
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
  • If I gave more than 2 categories? My actual data has n number of categories, is there a way to make the `into` argument in `separate` dynamic? – Megha John Jan 30 '18 at 10:11
  • I wish there was a `separate_all` function :-) You'll probably have to create your own function or use a different package. I'll have a look. – AntoniosK Jan 30 '18 at 10:13
  • 1
    `df %>% group_by(DeviceId) %>% summarise_all(function(x) paste0(x, collapse = "_")) %>% cSplit(c('ContentID', 'Use'), '_')` using `cSplit' from the `splitstackshape` package Thank you @AntoniosK for setting me in the right direction! – Megha John Jan 30 '18 at 10:14
  • 1
    Great! Or maybe `cSplit(names(.)[-1], '_')` if you want to automatically exclude first column, without typing all column names of interest... – AntoniosK Jan 30 '18 at 10:18
  • That is a neat trick! – Megha John Jan 30 '18 at 10:19
  • 1
    Also keep in mind that within `summarise_all` you can use the `funs()` argument, like this `summarise_all(funs(paste0(., collapse = "_")))`. This will give you the same result. However, `funs()` can be used to apply multiple functions to multiple columns, like this: `summarise_all(funs(a = paste0(., collapse = "_"), b = paste0(., collapse = "||")))` which might be really useful in a different case in the future. – AntoniosK Jan 30 '18 at 10:25
1
library(data.table)
DT <- setDT(df)

Using your dcast intent, you could do

    Reduce(function(dtf1,dtf2) merge(dtf1,dtf2, by = "DeviceId"), 
           lapply( unique(DT$ContentID),
                   function(x){dcast(DT[ContentID == x],DeviceId + ContentID  ~ ContentID ,value.var = "Use")} ))

   DeviceId ContentID.x        C1 ContentID.y        C2
1:       D1          C1 0.6784913          C2 0.3021474
2:       D2          C1 0.6957901          C2 0.6458492
3:       D3          C1 0.8350400          C2 0.3622916
denis
  • 5,580
  • 1
  • 13
  • 40