There are two issues with your code.
The principal issue that many have pointed out is the dummy creation function has no name.
To get around that, simply assign the declaration to a variable. In this case, I've opted to pick make_dummies
.
The main issue for this post is the use of freq[i]
giving you a count instead of the string to be matched.
Corrections
Create some data to test with:
# Make some data
n = 10
set.seed(112)
d = data.frame(id = 1:n,
cat_var = sample(letters,n, replace = T),
num_var = runif(n),
stringsAsFactors = F
)
tempcsv = tempfile()
write.csv(d, file=tempcsv, row.names = F)
Sample of Data:
id cat_var num_var
1 j 0.2359040
2 x 0.3471606
3 z 0.9049400
4 z 0.6322996
5 g 0.6743289
6 c 0.9700548
7 b 0.5604765
8 s 0.5553125
9 d 0.7432414
10 k 0.3701336
Dummy Variable Code:
# Read that data in
filename = tempcsv # file.choose()
filedata = read.csv(filename, stringsAsFactors = F)
cat_name = "cat_var" #readline(prompt="Enter the categorical name to create its dummy variables: ")
make_dummies = function(filedata,cat_name)
{
data_cat=filedata[,cat_name]
if(class(data_cat)=="character")
{
freq=sort(table(data_cat))
freq=freq[-1]
for( i in names(freq))
{
colName = paste(cat_name,i,sep="_")
filedata[,colName] = ifelse(data_cat==i,1,0) # Note the change here
}
filedata[,cat_name]=NULL
print("Successfully created dummy variables...")
}else
{
print("Please enter a categorical variable with character as its datatype")
}
return(filedata)
}
Sample Call:
(filedata = make_dummies(filedata, cat_name))
Output:
id num_var cat_var_c cat_var_d cat_var_g cat_var_j cat_var_k cat_var_s cat_var_x cat_var_z
1 1 0.2359040 0 0 0 1 0 0 0 0
2 2 0.3471606 0 0 0 0 0 0 1 0
3 3 0.9049400 0 0 0 0 0 0 0 1
4 4 0.6322996 0 0 0 0 0 0 0 1
5 5 0.6743289 0 0 1 0 0 0 0 0
6 6 0.9700548 1 0 0 0 0 0 0 0
7 7 0.5604765 0 0 0 0 0 0 0 0
8 8 0.5553125 0 0 0 0 0 1 0 0
9 9 0.7432414 0 1 0 0 0 0 0 0
10 10 0.3701336 0 0 0 0 1 0 0 0
Future Use
Also, I would highly advise you to use the built in model.matrix()
function in R with the appropriately cast factor
instead of string
typed data.
For example:
model.matrix(~ cat_var - 1, filedata)
Output:
cat_vara cat_varg cat_varm cat_varo cat_vart cat_varu cat_varw cat_varz
1 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 0
4 0 0 1 0 0 0 0 0
5 0 1 0 0 0 0 0 0
6 0 0 0 0 1 0 0 0
7 0 0 0 0 1 0 0 0
8 0 0 0 0 0 1 0 0
9 0 0 0 0 0 0 0 1
10 0 0 0 1 0 0 0 0