1

I have a dataset as follows:

Pt    EVENT
123    GGG
123    Nor
123    tre
144    GGG
1667   tre
1667   Nor
1667   tre

I am trying to prepare the data eventually for a Sankey diagram and in order to do this I need to get the data into the following shape

Pt    
123   GGG      Nor   tre
144   GGG
1667  tre      Nor   tre

And then eventually I think into a source, target, value format as follows:

source    target   value
 GGG        Nor       1
 GGG                  1
 tre        tre       1
 Nor        tre       2

The part I don't understand is how to get from the original dataset to the second one. I thought I might be able to do it with dplyr but no joy:

  Sankey<-EndoSubset %>%
      group_by(Pt) %>% 
      select(t(EVENT))
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
  • There is a subtle difference between this question and the link to the proposed duplicate. The duplicate addresses the case were there is a "key" column and a "value" column (also an "id" column, `Pt` in this case). This question doesn't have a "key" column, just a "value" column; at least from the original data frame to the first transformation (the "key" column would be the group row number). The answer from "akrun" addresses this. I think there is likely a different SO post that addresses this (I haven't found it though). – steveb Jul 10 '16 at 18:48

4 Answers4

3

This can be done with reshape() by synthesizing a time column:

reshape(cbind(df,time=ave(seq_len(nrow(df)),df$Pt,FUN=seq_along)),dir='w',idvar='Pt');
##     Pt EVENT.1 EVENT.2 EVENT.3
## 1  123     GGG     Nor     tre
## 4  144     GGG    <NA>    <NA>
## 5 1667     tre     Nor     tre

Data

df <- data.frame(Pt=c(123L,123L,123L,144L,1667L,1667L,1667L),EVENT=c('GGG','Nor','tre','GGG',
'tre','Nor','tre'),stringsAsFactors=F);
bgoldst
  • 34,190
  • 6
  • 38
  • 64
2

We can use data.table

 library(data.table)
 dcast(setDT(df1), Pt~rowid(Pt), value.var="EVENT")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Converting to wide just aggregates the values for each Pt. akrun's answer is what I wanted. WIll accept in a minute when SO allows me to – Sebastian Zeki Jul 10 '16 at 18:35
  • 1
    A `dplyr` and `tidyr` solution that could work is `data %>% group_by(Pt) %>% mutate(rn = 1:n()) %>% ungroup %>% spread(rn, EVENT)` – steveb Jul 10 '16 at 18:51
  • @akrun Thanks for including this. I put it in as an answer but if that is not good SO etiquette (since it is now here), I will remove my answer. – steveb Jul 10 '16 at 18:59
  • @steveb I removed the answer from my post. – akrun Jul 10 '16 at 19:00
  • @akrun Thanks. Given the chain of events, I will assume all is good w.r.t. making it an answer :-) – steveb Jul 10 '16 at 19:01
2

Here is a dplyr and tidyr solution:

library(dplyr)
library(tidyr)

data %>%
     group_by(Pt) %>%
     mutate(rn = 1:n()) %>%
     ungroup %>%
     spread(rn, EVENT)
steveb
  • 5,382
  • 2
  • 27
  • 36
0

Another option:

library(data.table)
l <- sapply(unique(df$Pt), function(x) data.frame(rbind(c(x,df[df$Pt==x,]$EVENT))))
rbindlist(l, fill = T)

     # X1  X2  X3  X4
# 1:  123 GGG Nor tre
# 2:  144 GGG  NA  NA
# 3: 1667 tre Nor tre

DATA

df <- structure(list(Pt = c(123L, 123L, 123L, 144L, 1667L, 1667L, 1667L
), EVENT = c("GGG", "Nor", "tre", "GGG", "tre", "Nor", "tre")), .Names = c("Pt", 
"EVENT"), row.names = c(NA, -7L), class = "data.frame")
989
  • 12,579
  • 5
  • 31
  • 53