1

While creating this function I want to pass my input for filedata and cat_name taken from user as the parameters but the problem is these inputs stored in filedata and cat_name do not pass inside the function. I am not able to understand why is this happening. What should I do?

Code:

filename = file.choose()
filedata = read.csv(filename, stringsAsFactors = F)

cat_name = readline(prompt="Enter the categorical name to create its dummy variables: ")

function(filedata, cat_name)
{
  data_cat=filedata[,cat_name]

  if(class(data_cat)=="character")
  {
    freq=sort(table(data_cat))
    freq=freq[-1]
    for( i in names(freq))
    {
      colName = paste(cat_name,i,sep="_")
      filedata[,colName] = ifelse(data_cat==freq[i],1,0)

    }
    filedata[,cat_name]=NULL

    print("Successfully created dummy variables...")
  } else
  {

    print("Please enter a categorical variable with character as its datatype")
  }
  return(filedata)
}
coatless
  • 20,011
  • 13
  • 69
  • 84
richa1465
  • 23
  • 6
  • You need to assign your function and then apply it to the variables. Something like `my_fun <- function(filedata,cat_name) {...};my_fun(filedata,cat_name)`. – Stibu Jul 09 '16 at 21:02
  • Please have a look on [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and edit your question. – user2100721 Jul 09 '16 at 21:05
  • i made the changes as suggested by doing this way myfunction=function(filedata,catname){...} and then executed myfunction(filedata,cat_name)..The problem is function is running and it is even printing "Successfully created dummy variables" but my data frame is not changing and this is coz function is not getting the input mapped with its arguments...now how to map my input of filedata and cat_name with that of function arguments.. – richa1465 Jul 09 '16 at 21:59
  • Did you assign dataframe to function since you return a dataframe: `newdf <- myfunction(filedata, cat_name)` – Parfait Jul 09 '16 at 22:05
  • Yes i assigned it but when i am running names(newdf) it is giving NULL.I think this is coz my input to filedata and cat_name is not getting passed in the function parameters.What to do.. – richa1465 Jul 09 '16 at 22:28

2 Answers2

1

There are two issues with your code.

The principal issue that many have pointed out is the dummy creation function has no name.

To get around that, simply assign the declaration to a variable. In this case, I've opted to pick make_dummies.

The main issue for this post is the use of freq[i] giving you a count instead of the string to be matched.

Corrections

Create some data to test with:

# Make some data
n = 10

set.seed(112)
d = data.frame(id = 1:n,
               cat_var = sample(letters,n, replace = T),
               num_var = runif(n),
               stringsAsFactors = F
               )

tempcsv = tempfile()

write.csv(d, file=tempcsv, row.names = F)

Sample of Data:

 id cat_var   num_var
  1       j 0.2359040
  2       x 0.3471606
  3       z 0.9049400
  4       z 0.6322996
  5       g 0.6743289
  6       c 0.9700548
  7       b 0.5604765
  8       s 0.5553125
  9       d 0.7432414
 10       k 0.3701336

Dummy Variable Code:

# Read that data in

filename = tempcsv # file.choose()
filedata = read.csv(filename, stringsAsFactors = F)

cat_name = "cat_var" #readline(prompt="Enter the categorical name to create its dummy variables: ")

make_dummies = function(filedata,cat_name)
{
  data_cat=filedata[,cat_name]

  if(class(data_cat)=="character")
  {
    freq=sort(table(data_cat))
    freq=freq[-1]
    for( i in names(freq))
    {
      colName = paste(cat_name,i,sep="_")
      filedata[,colName] = ifelse(data_cat==i,1,0) # Note the change here
    }
    filedata[,cat_name]=NULL

    print("Successfully created dummy variables...")
  }else
  {

    print("Please enter a categorical variable with character as its datatype")
  }
  return(filedata)
}

Sample Call:

(filedata = make_dummies(filedata, cat_name))

Output:

   id   num_var cat_var_c cat_var_d cat_var_g cat_var_j cat_var_k cat_var_s cat_var_x cat_var_z
1   1 0.2359040         0         0         0         1         0         0         0         0
2   2 0.3471606         0         0         0         0         0         0         1         0
3   3 0.9049400         0         0         0         0         0         0         0         1
4   4 0.6322996         0         0         0         0         0         0         0         1
5   5 0.6743289         0         0         1         0         0         0         0         0
6   6 0.9700548         1         0         0         0         0         0         0         0
7   7 0.5604765         0         0         0         0         0         0         0         0
8   8 0.5553125         0         0         0         0         0         1         0         0
9   9 0.7432414         0         1         0         0         0         0         0         0
10 10 0.3701336         0         0         0         0         1         0         0         0

Future Use

Also, I would highly advise you to use the built in model.matrix() function in R with the appropriately cast factor instead of string typed data.

For example:

model.matrix(~ cat_var - 1, filedata)

Output:

  cat_vara cat_varg cat_varm cat_varo cat_vart cat_varu cat_varw cat_varz
1         1        0        0        0        0        0        0        0
2         1        0        0        0        0        0        0        0
3         0        0        0        0        0        0        1        0
4         0        0        1        0        0        0        0        0
5         0        1        0        0        0        0        0        0
6         0        0        0        0        1        0        0        0
7         0        0        0        0        1        0        0        0
8         0        0        0        0        0        1        0        0
9         0        0        0        0        0        0        0        1
10        0        0        0        1        0        0        0        0
coatless
  • 20,011
  • 13
  • 69
  • 84
  • One more doubt..when I was running the code without using function..I mean the inner part of the code with freq[i] then it was working perfectly and was giving the desired output..so Y was it working properly then..I dint understand this.If you could clear – richa1465 Jul 09 '16 at 22:40
  • I have no clue why outside the function it would work vs. inside it. Perhaps you defined `freq` different? Regardless, the use of `table(data_cat)` gives a count of the data with the variable contents listed as a `name` of the vector. When looping with `for`, you were then proceeding to retrieve the `name` of the vector and then use it to subset the `freq`. This yielded a number (e.g. `1`, `2`, `3`, and so on) instead of the appropriate label. – coatless Jul 09 '16 at 22:46
  • No I had define freq in the same manner..And my output was coming like this "race_ Amer-Indian-Eskimo" "race_ Asian-Pac-Islander" "race_ Black" "race_ White" which is coming even now but after replacing freq[i] with i...Coz of this i got stuck and not even to figure out the difference even now – richa1465 Jul 09 '16 at 22:58
0

The lines of the code that I commented out are only commented out because they prevent it from being a reproducible example. Instead I loaded a built-in dataset and used that (instead of an unknown file from my filesystem).

#filename = file.choose()
#filedata = read.csv(filename, stringsAsFactors = F)

data("mtcars")
mtcars$cn <- row.names(mtcars)
filedata <- mtcars
#cat_name = readline(prompt="Enter the categorical name to create its dummy variables: ")
cat_name <- "cn"

colnames(filedata)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb" "cn"  


f<-
function(filedata, cat_name)
{
  data_cat=filedata[,cat_name]

  if(class(data_cat)=="character")
  {
    freq=sort(table(data_cat))
    freq=freq[-1]
    for( i in names(freq))
    {
      colName = paste(cat_name,i,sep="_")
      filedata[,colName] = ifelse(data_cat==freq[i],1,0)

    }
    filedata[,cat_name]=NULL

    print("Successfully created dummy variables...")
  } else
  {

    print("Please enter a categorical variable with character as its datatype")
  }
  return(filedata)
}

filedata <- f(filedata,cat_name)
colnames(filedata)


 [1] "mpg"                    "cyl"                    "disp"                   "hp"                     "drat"                   "wt"                     "qsec"                  
 [8] "vs"                     "am"                     "gear"                   "carb"                   "cn_Cadillac Fleetwood"  "cn_Camaro Z28"          "cn_Chrysler Imperial"  
[15] "cn_Datsun 710"          "cn_Dodge Challenger"    "cn_Duster 360"          "cn_Ferrari Dino"        "cn_Fiat 128"            "cn_Fiat X1-9"           "cn_Ford Pantera L"     
[22] "cn_Honda Civic"         "cn_Hornet 4 Drive"      "cn_Hornet Sportabout"   "cn_Lincoln Continental" "cn_Lotus Europa"        "cn_Maserati Bora"       "cn_Mazda RX4"          
[29] "cn_Mazda RX4 Wag"       "cn_Merc 230"            "cn_Merc 240D"           "cn_Merc 280"            "cn_Merc 280C"           "cn_Merc 450SE"          "cn_Merc 450SL"         
[36] "cn_Merc 450SLC"         "cn_Pontiac Firebird"    "cn_Porsche 914-2"       "cn_Toyota Corolla"      "cn_Toyota Corona"       "cn_Valiant"             "cn_Volvo 142E"     

Nice function, by the way. I might use it.

Hack-R
  • 22,422
  • 14
  • 75
  • 131