I have a data frame, in which there is a column comprised of lists of values, such as
x1 | x2 |
---|---|
ID1 | a1 |
ID2 | c(a1,a2) |
ID3 | c(a1,a3,a4,a10) |
and so on. How do I determine the unique values of the elements in column x2, and then convert them into a sparse matrix of binary values? For instance, the unique values here are c(a1,a2,a3,a4,a10). I need to determine the unique values of the various lists in x2, and then have this represented as a series of binary columns, such as:
x1 | x2 | a1 | a2 | a3 | a4 | a10 |
---|---|---|---|---|---|---|
ID1 | a1 | 1 | 0 | 0 | 0 | 0 |
ID2 | c(a1,a2) | 1 | 1 | 0 | 0 | 0 |
ID3 | c(a1,a3,a4,a10) | 1 | 0 | 1 | 1 | 1 |
Using the unique() function considers the uniqueness of the entire list rather than the elements that comprise it, which is logical as it applies the function to each row. So how would I do it where it considers the elements various lists that comprise each row of the data frame itself?
What I want to do here seems like one-hot encoding, but I wasn't sure how to do that when the column x2 is a column of lists. Am I approaching this wrong? I saw a question that has done this in python, is there a way to do this in dplyr?
' not sure what's causing that