2

I have a dataset like this :

id value1 value2
1    A     True
2    B     False
3    A     True
4    C     True

I want to identify a column with multiple values and convert it into multiple columns with True or False values in R. The result would be :

id value1.A value1.B value1.C value2
1     True     False  False    True
2     False    True   False    False
3     True     False  False    True
4     False    True   False    True

I am not sure how to use dcast for this. I wrote a function myself but it is too slow. The code for it is here :

to_multiple_columns <- function(col,attr_name){
elements <- names(table(col))
drops <- c("","True","False")
elements <- elements[ !elements %in% drops]
new_df <- data.frame(col) # to define data frame with nrows,ncols
if(length(elements) > 0){
new_attr_names <- paste(attr_name,elements,sep = ".")
    for(j in 1:length(new_attr_names)){
       new_df <- data.frame(new_df,grepl(elements[j],col))
    }
    drops <- c("col") #drop original col
    new_df <- new_df[,!(names(new_df) %in% drops)]
    names(new_df) <- new_attr_names
  }
  return(new_df)
}
thelatemail
  • 91,185
  • 12
  • 128
  • 188
bee2502
  • 1,145
  • 1
  • 10
  • 13

3 Answers3

2

You can use tidyr like this:

library(dplyr)
library(tidyr)
df %>%
  mutate(value = TRUE) %>%
  spread(value1, value, fill = FALSE, sep = '.')

Output:

  id value2 value1.A value1.B value1.C
1  1   True     TRUE    FALSE    FALSE
2  2  False    FALSE     TRUE    FALSE
3  3   True     TRUE    FALSE    FALSE
4  4   True    FALSE    FALSE     TRUE
Gopala
  • 10,363
  • 7
  • 45
  • 77
2

We can use dcast

library(data.table)
dcast(setDT(df1)[, value:=TRUE], id+value2~value1, value.var="value", fill = FALSE)
#   id value2     A     B     C
#1:  1   True  TRUE FALSE FALSE
#2:  2  False FALSE  TRUE FALSE
#3:  3   True  TRUE FALSE FALSE
#4:  4   True FALSE FALSE  TRUE
akrun
  • 874,273
  • 37
  • 540
  • 662
1

In base R, you can use xtabs.

If not all elements in value1 appear, you can cast it to a factor.

data.raw <- "id value1 value2
1    A     True
2    B     False
3    A     True
4    C     True"

data <- read.table(textConnection(data.raw), header = T)

data$value2 <- data$value2 == "True"

xtabs(value2 ~ id + value1, data = data)
#>    value1
#> id  A B C
#>   1 1 0 0
#>   2 0 0 0
#>   3 1 0 0
#>   4 0 0 1

For large data, you can also produce a sparse matrix.

xtabs(value2 ~ id + value1, data = data, sparse = T)
Michael Griffiths
  • 1,399
  • 7
  • 14