0

I am trying to create a third variable for the purpose of converting long to wide formats.

my data has the following form:

ID <- c(1,1,1,2,2,2,2,3,3,4,5,5)
CODE <- c(123,222,231,534,634525,3545,2342,235234,3453,2342,5345,64564)
df <- data.frame(ID, CODE)
df

   ID   CODE
1   1    123
2   1    222
3   1    231
4   2    534
5   2 634525
6   2   3545
7   2   2342
8   3 235234
9   3   3453
10  4   2342
11  5   5345
12  5  64564

But what I am trying to create is something in this form:

    ID2  code1  code2 code3 code4
  1   1    123    222   231      
  2   2    534 634525  3545  2342
  3   3 235234   3453  

Where the "code#" variable highest value is based on the longest length of the ID variable in df. Then, as in the example of the output I would like, the IDs without any value are just coded as "".

TimF
  • 121
  • 2
  • 8

1 Answers1

1

Here's a tidyverse approach:

library(tidyverse)

df %>%
  group_by(ID) %>%
  mutate(key = paste0("code", row_number())) %>%
  spread(key, CODE)

# A tibble: 5 x 5
# Groups:   ID [5]
     ID  code1  code2 code3 code4
  <dbl>  <dbl>  <dbl> <dbl> <dbl>
1     1    123    222   231    NA
2     2    534 634525  3545  2342
3     3 235234   3453    NA    NA
4     4   2342     NA    NA    NA
5     5   5345  64564    NA    NA
JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116