R how to 'spread' data with no key-value pair

Question

I have data:

rowID    incidentID participant.type
1          1                A
2          1                B
3          2                A
4          3                A
5          3                B
6          3                C
7          4                B
8          4                C

And I would like to end up with:

   rowID incident participant.type participant.type.1 participant.type.2
    1        1                A                  B                   
    2        2                A                                      
    3        3                A                  B                  C
    4        4                B                  C

I tried the spread but can't achieve one line per incident; I don't think I have a way of creating a key-value pair so I wonder if there is some other method for doing this.

Related [Transpose / reshape dataframe without “timevar” from long to wide format](https://stackoverflow.com/questions/11322801/transpose-reshape-dataframe-without-timevar-from-long-to-wide-format) — markus, Jan 13 '19 at 11:18
That link provides several solutions and I have added one of them as an answer below, thanks — user3387656, Jan 13 '19 at 11:58

score 2 · Accepted Answer · answered Jan 13 '19 at 11:49

2

Before using spread(), you need to create a proper key argument.

df %>% select(-rowID) %>%
       group_by(incidentID) %>%
       mutate(id = 1:n()) %>%
       spread(id, participant.type)

#   incidentID  `1`   `2`   `3`  
#        <int>  <fct> <fct> <fct>
# 1          1  A     B     NA   
# 2          2  A     NA    NA   
# 3          3  A     B     C    
# 4          4  B     C     NA

answered Jan 13 '19 at 11:49

Darren Tsai

32,117
5
21
51

1

This is the best solution. – Steen Harsted Jan 13 '19 at 12:06

score 1 · Answer 2 · answered Jan 13 '19 at 12:04

Since your grouping is based on the row order within the icidentID column. The following simple solution will also work.

It is just filtering the dataframe and then merging in the end.

It is probably not the best solution in terms of effective use of computing power, but it is easy to understand.

library(tidyverse)

df <- 
  tribble(
    ~rowID,    ~incidentID, ~participant.type,
    1,          1,                "A",
    2,          1,                "B",
    3,          2,                "A",
    4,          3,                "A",
    5,          3,                "B",
    6,          3,                "C",
    7,          4,                "B",
    8,          4,                "C")

df_1 <- df %>%
  select(-rowID) %>% 
  group_by(incidentID) %>% 
  filter(row_number()==1)


df_2 <- df %>%
  select(-rowID) %>% 
  group_by(incidentID) %>% 
  filter(row_number()==2) %>% 
  rename(participant.type.1 = participant.type)


df_3 <- df %>%
  select(-rowID) %>% 
  group_by(incidentID) %>% 
  filter(row_number()==3) %>% 
  rename(participant.type.2 = participant.type) 

full_join(df_1, full_join(df_2, df_3))

Result:

Joining, by = "incidentID"
Joining, by = "incidentID"
# A tibble: 4 x 4
# Groups:   incidentID [?]
  incidentID participant.type participant.type.1 participant.type.2
       <dbl> <chr>            <chr>              <chr>             
1          1 A                B                  NA                
2          2 A                NA                 NA                
3          3 A                B                  C                 
4          4 B                C                  NA

Paweł Chabros · Answer 3 · 2019-01-13T11:26:02.487

0

Here's my solution:

df %>%
  select(-rowID) %>%
  group_by(incidentID) %>%
  nest() %>%
  mutate(data = map_chr(data, ~str_c(.x$participant.type, collapse = '_'))) %>%
  separate(data, paste0('participant.type.', 0:2)) %>%
  mutate_at(2:4, ~replace_na(.x, ''))

edited Jan 13 '19 at 11:26

answered Jan 13 '19 at 11:14

Paweł Chabros

2,349
1
9
12

score 0 · Answer 4 · answered Jan 13 '19 at 11:37

We can use reshape2::dcast for this

reshape2::dcast(df, insidentID ~ participant.type)    
  #   insidentID    A    B    C
  # 1          1 <NA>    B <NA>
  # 2          8 <NA>    B <NA>
  # 3         12 <NA> <NA>    C
  # 4         16    A <NA> <NA>
  # 5         24 <NA>    B <NA>
  # 6         27 <NA>    B    C
  # 7         29 <NA> <NA>    C

with the data

set.seed(123)
df <- data.frame(insidentID = sample(0:30, 8L, replace = TRUE),
                 participant.type = sample(LETTERS[1:3], 8L, replace = TRUE),
                 stringsAsFactors = FALSE)
df
#   insidentID participant.type
# 1          8                B
# 2         24                B
# 3         12                C
# 4         27                B
# 5         29                C
# 6          1                B
# 7         16                A
# 8         27                C

score 0 · Answer 5 · answered Jan 13 '19 at 11:57

0

The 'related question' link provided by @markus shows a variety of other solutions, including what appears to be the most concise in a tidyverse format:

 df %>% 
  group_by(incidentID) %>%
  mutate(rn = paste0("newcolumn",row_number()))  %>%
  spread(rn, participant.type)

gives:

incidentID newcolumn1 newcolumn2 newcolumn3
       <int> <fct>      <fct>      <fct>     
1          1 A          B          NA        
2          2 A          NA         NA        
3          3 A          B          C         
4          4 B          C          NA

A

answered Jan 13 '19 at 11:57

user3387656

93
1
1
7

If you don't remove the variable `rowID`, the output won't be what you show above. And your solution is equivalent to mine. You don't need to post it again. – Darren Tsai Jan 13 '19 at 12:03
I appreciate your help, I think I posted the code based on the link before I saw yours. I've marked yours as the solution. – user3387656 Jan 13 '19 at 15:52

R how to 'spread' data with no key-value pair

5 Answers5