1

I have a dataset that looks like the following.

df = data.frame(val=c(4,2,6,3,4,5),
                algo=c("A","A","A","C","C","C"),
                id=c("james","james","james",
                     "james","james","james"))
df

I want to alter the structure of the data frame so that it's in wide format.

id     algo.A    algo.C
james   4         3 
james   2         4
james   6         5

I tried tidyr for this but get the following error.

> spread(df, id, algo)
Error: Duplicate identifiers for rows (1, 5)

Any suggestions on how to get the desired result?

ATMA
  • 1,450
  • 4
  • 23
  • 33

1 Answers1

3

We need a sequence column as there are duplicate identifiers. Specifically, spread cannot know that the different rows of algo are supposed to be different because they all have the same value for id. Specifying the sep argument in spread allows us to get the column name prefixes:

library(tidyverse)
df %>%
    group_by(id, algo) %>%        
    mutate(rn = row_number()) %>%
    spread(algo, val, sep = ".") %>%
    select(-rn)
# A tibble: 3 x 3
# Groups:   id [1]
#   id    algo.A algo.C
#   <fct>  <dbl>  <dbl>
#1 james      4      3
#2 james      2      4
#3 james      6      5

data

df <- data.frame(
    val = c(4, 2, 6, 3, 4, 5), 
    algo = c("A", "A", "A", "C", "C", "C"),
    id = c("james", "james", "james", "james", "james", "james")
) 
akrun
  • 874,273
  • 37
  • 540
  • 662