0

I have a large dataframe containing a cross table of keys from other tables. Instead of having multiple instances of key1 coupled with different values for key2 I would like there to be one row for each key1 with several columns instead.

I tried doing this with a for loop but it couldn't get it to work.

Here's an example. I have a data frame with the structure df1 and I would like it to have the structure of df2.

df1 <- data.frame(c("a", "a", "a", "b", "b", "c", "c", "c", "c", "c", "d"),c(1, 2, 3, 2, 3, 1, 2, 3, 4, 5, 9))
names(df1) <- c("key1", "key2")


df2 <- data.frame(c("a", "b", "c", "d"), c(1, 2, 1, 9), c(2, 3, 2, NA), c(3, NA, 3, NA), c(NA, NA, 4, NA), c(NA, NA, 5, NA))
names(df2) <- c("key1", "key2_1", "key2_2", "key2_3", "key2_4", "key2_5")

I suspect this is possible using an approach utilizing apply but I haven't found a way yet. Any help is appreciated!

erikfjonsson
  • 197
  • 1
  • 11
  • 1
    Using `dplyr` and `tidyr`, you could do `df1 %>% group_by(key1) %>% mutate(rn = row_number()) %>% spread(rn, key2) ` – Ronak Shah Aug 16 '19 at 09:05

1 Answers1

1
library(dplyr)
library(tidyr)

df1 %>% 
  group_by(key1) %>% 
  mutate(var = paste0("key2_", seq(n()))) %>% 
  spread(var, key2)

# # A tibble: 4 x 6
# # Groups:   key1 [4]
#     key1  key2_1 key2_2 key2_3 key2_4 key2_5
#     <fct>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#   1 a          1      2      3     NA     NA
#   2 b          2      3     NA     NA     NA
#   3 c          1      2      3      4      5
#   4 d          9     NA     NA     NA     NA
Adam Quek
  • 6,973
  • 1
  • 17
  • 23