-7

I have a column 'X1' which has concatenated entries which needs to be converted to individual levels and then a frequency table of co-occuring levels

items x1

ram [a,b,c ]

pam [d,e,f]

has to be transformed to

items   a   b   c   d   e   f     

ram     1   1   1   0   0   0   

pam     0   0   0   1  1   1      

pls advise

Devesh
  • 719
  • 1
  • 7
  • 13
  • Related: [*Generate a dummy-variable*](https://stackoverflow.com/q/11952706/2204410) – Jaap Sep 25 '18 at 06:43
  • Its not same as the link showed by you, my column 'X1' has categories stacked in them- a,b,c,d,e,f each are individual categories for which I would want to create individual dummy columns – Devesh Sep 25 '18 at 06:46

2 Answers2

1

Based on the input showed, the values in the second column can be a string. One option would be to extract the letters from the 'ram' column with str_extract (stringr), stack it to a two column data.frame, get the frequency count (table) after converting the 'values' column to a factor with levels specified so that we get 0 for all the levels that are not found in the dataset, reshape it to 'long' format with as.data.frame

library(stringr)
df2 <- stack(setNames(str_extract_all(df1$ram, '[a-z]'), seq_len(nrow(df1))))[2:1]
out <- as.data.frame(table(df2$ind, factor(df2$values, levels = letters[1:6])))[-1]
names(out) <- names(df1)
out
#   items ram
#1     a   1
#2     b   1
#3     c   1
#4     d   0
#5     e   0
#6     f   0

data

df1 <- data.frame(items = 'x1', ram = '[a,b,c]', stringsAsFactors = FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Using dummies library:

library(dummies)
df <- dummy.data.frame(df, names=c(“MyField1”), sep="_")

Note: This splits the original field into number of unique values. The original field is no longer available in data frame.

Example:

Data:

enter image description here

after

df <- dummy.data.frame(df, names=c(“MyField1”), sep="_")

enter image description here

Anil Kumar
  • 385
  • 2
  • 17
  • my column 'X1' has categories stacked in them- a,b,c,d,e,f each are individual categories for which I would want to create individual dummy columns. I need to unstack those categories and then create dummy variables – Devesh Sep 25 '18 at 06:48
  • Here myfield1 is unstacked according to available categories(A,B,C) and creating individual columns for each category. May be I am unable to follow your question, Please elaborate – Anil Kumar Sep 25 '18 at 07:12
  • if I take your example- then MyField1 should have entries like (A,B), (B,C,D). (A,C) etc in the 3 records respectively . Then the columns should be populated accordingly where for first entry dummy column A & B would be 1,1 and rest dummy columns would be 0 and so on – Devesh Sep 25 '18 at 07:39