0

I have a data frame, in which there is a column comprised of lists of values, such as

x1 x2
ID1 a1
ID2 c(a1,a2)
ID3 c(a1,a3,a4,a10)

and so on. How do I determine the unique values of the elements in column x2, and then convert them into a sparse matrix of binary values? For instance, the unique values here are c(a1,a2,a3,a4,a10). I need to determine the unique values of the various lists in x2, and then have this represented as a series of binary columns, such as:

x1 x2 a1 a2 a3 a4 a10
ID1 a1 1 0 0 0 0
ID2 c(a1,a2) 1 1 0 0 0
ID3 c(a1,a3,a4,a10) 1 0 1 1 1

Using the unique() function considers the uniqueness of the entire list rather than the elements that comprise it, which is logical as it applies the function to each row. So how would I do it where it considers the elements various lists that comprise each row of the data frame itself?

What I want to do here seems like one-hot encoding, but I wasn't sure how to do that when the column x2 is a column of lists. Am I approaching this wrong? I saw a question that has done this in python, is there a way to do this in dplyr?

Arjun Mohan
  • 63
  • 3
  • 10
  • Try `library(dplyr);library(tidyr);library(qdapTools); df1 %>% mutate(out = mtabulate(x2)) %>% unpack(out)` or another option is `df1 %>% unnest(x2) %>% mutate(val = 1) %>% pivot_wider(names_from = x2, values_from = val, values_fill = 0)` – akrun Jun 16 '22 at 19:54
  • method 1 gives me the error 'unable to find an inherited method for function ‘unpack’ for signature ‘"data.frame" '; method 2 gives me the error ' Error in `stop_vctrs()`: ! Can't convert to ' not sure what's causing that – Arjun Mohan Jun 23 '22 at 14:10
  • then use method 2 – akrun Jun 23 '22 at 14:46
  • I used mtabulate, but through converting each list to a string and then using the following syntax : `cbind(df1, mtabulate(strsplit(df1$x2, ",")))` – Arjun Mohan Jun 23 '22 at 15:15

0 Answers0