unaggregating , concatenated levels in a factor variable and creating a frequency table

Question

I have a column 'X1' which has concatenated entries which needs to be converted to individual levels and then a frequency table of co-occuring levels

items x1

ram [a,b,c ]

pam [d,e,f]

has to be transformed to

items   a   b   c   d   e   f     

ram     1   1   1   0   0   0   

pam     0   0   0   1  1   1

pls advise

Related: [*Generate a dummy-variable*](https://stackoverflow.com/q/11952706/2204410) — Jaap, Sep 25 '18 at 06:43
Its not same as the link showed by you, my column 'X1' has categories stacked in them- a,b,c,d,e,f each are individual categories for which I would want to create individual dummy columns — Devesh, Sep 25 '18 at 06:46

akrun · Answer 1 · 2018-09-25T06:18:10.183

Based on the input showed, the values in the second column can be a string. One option would be to extract the letters from the 'ram' column with str_extract (stringr), stack it to a two column data.frame, get the frequency count (table) after converting the 'values' column to a factor with levels specified so that we get 0 for all the levels that are not found in the dataset, reshape it to 'long' format with as.data.frame

library(stringr)
df2 <- stack(setNames(str_extract_all(df1$ram, '[a-z]'), seq_len(nrow(df1))))[2:1]
out <- as.data.frame(table(df2$ind, factor(df2$values, levels = letters[1:6])))[-1]
names(out) <- names(df1)
out
#   items ram
#1     a   1
#2     b   1
#3     c   1
#4     d   0
#5     e   0
#6     f   0

data

df1 <- data.frame(items = 'x1', ram = '[a,b,c]', stringsAsFactors = FALSE)

score 0 · Answer 2 · answered Sep 25 '18 at 06:32

0

Using dummies library:

library(dummies)
df <- dummy.data.frame(df, names=c(“MyField1”), sep="_")

Note: This splits the original field into number of unique values. The original field is no longer available in data frame.

Example:

Data:

after

df <- dummy.data.frame(df, names=c(“MyField1”), sep="_")

answered Sep 25 '18 at 06:32

Anil Kumar

385
2
17

my column 'X1' has categories stacked in them- a,b,c,d,e,f each are individual categories for which I would want to create individual dummy columns. I need to unstack those categories and then create dummy variables – Devesh Sep 25 '18 at 06:48
Here myfield1 is unstacked according to available categories(A,B,C) and creating individual columns for each category. May be I am unable to follow your question, Please elaborate – Anil Kumar Sep 25 '18 at 07:12
if I take your example- then MyField1 should have entries like (A,B), (B,C,D). (A,C) etc in the 3 records respectively . Then the columns should be populated accordingly where for first entry dummy column A & B would be 1,1 and rest dummy columns would be 0 and so on – Devesh Sep 25 '18 at 07:39

unaggregating , concatenated levels in a factor variable and creating a frequency table

2 Answers2

data