Creating Function to extract characters in a given column in R

Question

Here is my "practice" data set:

             key            date     census  
    1: 01_35004_10-14_+_M 11NOV2001 2.934397
    2: 01_35004_10-14_+_M 06JAN2002 3.028231
    3: 01_35004_10-14_+_M 07APR2002 3.180712
    4: 01_35004_10-14_+_M 02JUN2002 3.274546
    5: 01_35004_10-14_+_M 28JUL2002 3.368380
    6: 01_35004_10-14_+_M 22SEP2002 3.462214
    7: 01_35004_10-14_+_M 22DEC2002 3.614694
    8: 01_35004_10-14_+_M 16FEB2003 3.708528
    9: 01_35004_10-14_+_M 13JUL2003 3.954843
    10:01_35004_10-14_+_M 07SEP2003 4.048677

So, with the code:

     var = c("State","Zip_Code", "Age_Group", "Race", "Gender")
     df[, Var]<- NA
     df[, Var] <- sapply(df$key, function(x) unlist(strsplit(as.character(x[1]), "_")))

I am able to extract the components in each row of column "key" and fill them into the columns created in var such that the new data set is (only first observation):

       key                 date     census  State   Zip_Code    Age_Group   Race    Gender
1  01_35004_10-14_+_M   11NOV2001   2.934397    1    35004        10-14       +       M

My question is: can I make a "universal" function that can work on any data set and allows users to decide which column they want to extract components from?

For instance maybe there is a different data set which looks like this:

  Chocloate Milk
   Milk_Choclate

which I would like to use a function on to extract "Milk" & "Chocolate" to create new variables "Ingredient 1" and "Ingredient 2" filled with those components:

  Chocolate Milk       Ingredient 1      Ingredient 2
   Milk_Chocolate         Milk             Chocolate

Here is the function that I tried which uses the statement above with the "practice" data set:

  f = function(x,y,z) {
     x[,y]=NA
     x[, y] <- sapply(x$z, function(x) unlist(strsplit(as.character(x[1]), "_")))                                                                                
}      
 f(df,var,key)

But I receive the following error:

 Error in `[<-.data.table`(`*tmp*`, , y, value = list()) : 
 Supplied 5 columns to be assigned an empty list (which may be an empty data.table or data.frame since they are lists too). To delete multiple columns use NULL instead. To add multiple empty list columns, use list(list()).

Please help.

Thanks,

Keith

Essentially, I want it to do exactly what your code did, just on a different data set with a different column. Perhaps the new data set has the column "Chocolate milk" with an observation like Milk_Chocolate. Then your code would extract as a new column :"Ingredient 1" : Milk & a second new column :"Ingredient 2" : Chocolate — Keith, Jun 01 '16 at 16:01
You have a data.table. It is better to use data.table methods. For example, `df[, c("Ingredient1", "Ingredient2") := tstrsplit(ChocolateMilk, "_")]` — akrun, Jun 01 '16 at 16:11
Using that example you've given, Say the data set is named df and you set var = c("Ingredient 1", "Ingredient 1") then create a function : f = function(x,y,z) { x[,y := tstrsplit(z, '_')] } then input: f(df,var,chocolatemilk) I then receive the error: " Error in as.character(x) :cannot coerce type 'closure' to vector of type 'character'" — Keith, Jun 01 '16 at 16:20
Just as a sidenote for future encounters: if you post data, please post a `dput` or some `data.frame` expression for easy copying. See under the heading `Copy your data` here: http://stackoverflow.com/a/5963610/2378649 — coffeinjunky, Jun 01 '16 at 16:46

score 1 · Accepted Answer · answered Jun 01 '16 at 16:20

This should work:

f = function(x,y,z) {
  x[, y] = NA
  x[, y] <- t(sapply(x[, z], function(x) unlist(strsplit(as.character(x[1]), "_"))) )
  return(x)
}      

f(x = df, y = var, z = "key")

df2 <- data.frame(drink = c("Milk_Chocolate", "Juice_Water")) 
f(x = df2, y = c("Ingredient 1" , "Ingredient 2"), z = "drink")

Creating Function to extract characters in a given column in R

1 Answers1