R dataframe, split one column by the values in another column

Question

I have a dataframe in R that looks similar to the following:

A    B     C

A    X     1   
A    Y     3   
A    Z     3
A    Z     2

How can I convert the single column 'C' into multiple columns that correspond to their values in B, given that all the columns contain factors and not strings or integers?

I want something that looks like below, either with or without the B/C columns still in the dataframe.

A    B    C     X    Y    Z

A    X    1     1    NA   NA
A    Y    3     NA   3    NA
A    Z    3     NA   NA   3
A    Z    2     NA   NA   2

Ideally, I would like my final output to be a dataframe in R, as I intend to merge it with another dataframe that has matching values A values.

Apologies if this post is duplicate but I didn't spot any where I could tell somebody was asking the same thing. - Thanks!

akrun · Answer 1 · 2018-11-28T19:25:16.720

We can use dcast from data.table to convert the 'column 'C' to 'wide' and then cbind with the original dataset

library(data.table)
cbind(df1, dcast(setDT(df1), seq_len(nrow(df1))~ B,
           value.var = 'C')[, -1, with = FALSE])
#    A B C  X  Y  Z
#1: A X 1  1 NA NA
#2: A Y 3 NA  3 NA
#3: A Z 3 NA NA  3
#4: A Z 2 NA NA  2

data

df1 <- structure(list(A = c("A", "A", "A", "A"), B = c("X", "Y", "Z", 
 "Z"), C = c(1L, 3L, 3L, 2L)), class = "data.frame", row.names = c(NA, 
 -4L))

score 0 · Answer 2 · answered Nov 28 '18 at 19:22

Here are two tidyverse ways that I will admit are slightly clunky. They both make use of tidyr::spread to get wide-shaped data, and then bind the wide version back to the original.

library(tidyr)
library(dplyr)

To use spread, you first need some sort of identification for each row. A quick way to do that is with tibble::rowid_to_column.

df %>%
  tibble::rowid_to_column() %>%
  spread(key = B, value = C)
#>   rowid A  X  Y  Z
#> 1     1 A  1 NA NA
#> 2     2 A NA  3 NA
#> 3     3 A NA NA  3
#> 4     4 A NA NA  2

You can do that, then column-bind this to the original data frame, but your columns will be out of order and you'll have a column A1 that is identical to column A. Using select, you can pick the columns you want in the proper order.

df %>%
  tibble::rowid_to_column() %>%
  spread(key = B, value = C) %>%
  bind_cols(df) %>%
  select(A, B, C, X, Y, Z)
#>   A B C  X  Y  Z
#> 1 A X 1  1 NA NA
#> 2 A Y 3 NA  3 NA
#> 3 A Z 3 NA NA  3
#> 4 A Z 2 NA NA  2

A little nicer is to do the spread operation inside your column binding, then just drop the 2 extraneous columns.

bind_cols(
  df, 
  df %>% tibble::rowid_to_column() %>% spread(key = B, value = C)
) %>%
  select(-rowid, -A1)
#>   A B C  X  Y  Z
#> 1 A X 1  1 NA NA
#> 2 A Y 3 NA  3 NA
#> 3 A Z 3 NA NA  3
#> 4 A Z 2 NA NA  2

^{Created on 2018-11-28 by the reprex package (v0.2.1)}

R dataframe, split one column by the values in another column

2 Answers2

data