Here is my "practice" data set:
key date census
1: 01_35004_10-14_+_M 11NOV2001 2.934397
2: 01_35004_10-14_+_M 06JAN2002 3.028231
3: 01_35004_10-14_+_M 07APR2002 3.180712
4: 01_35004_10-14_+_M 02JUN2002 3.274546
5: 01_35004_10-14_+_M 28JUL2002 3.368380
6: 01_35004_10-14_+_M 22SEP2002 3.462214
7: 01_35004_10-14_+_M 22DEC2002 3.614694
8: 01_35004_10-14_+_M 16FEB2003 3.708528
9: 01_35004_10-14_+_M 13JUL2003 3.954843
10:01_35004_10-14_+_M 07SEP2003 4.048677
So, with the code:
var = c("State","Zip_Code", "Age_Group", "Race", "Gender")
df[, Var]<- NA
df[, Var] <- sapply(df$key, function(x) unlist(strsplit(as.character(x[1]), "_")))
I am able to extract the components in each row of column "key" and fill them into the columns created in var such that the new data set is (only first observation):
key date census State Zip_Code Age_Group Race Gender
1 01_35004_10-14_+_M 11NOV2001 2.934397 1 35004 10-14 + M
My question is: can I make a "universal" function that can work on any data set and allows users to decide which column they want to extract components from?
For instance maybe there is a different data set which looks like this:
Chocloate Milk
Milk_Choclate
which I would like to use a function on to extract "Milk" & "Chocolate" to create new variables "Ingredient 1" and "Ingredient 2" filled with those components:
Chocolate Milk Ingredient 1 Ingredient 2
Milk_Chocolate Milk Chocolate
Here is the function that I tried which uses the statement above with the "practice" data set:
f = function(x,y,z) {
x[,y]=NA
x[, y] <- sapply(x$z, function(x) unlist(strsplit(as.character(x[1]), "_")))
}
f(df,var,key)
But I receive the following error:
Error in `[<-.data.table`(`*tmp*`, , y, value = list()) :
Supplied 5 columns to be assigned an empty list (which may be an empty data.table or data.frame since they are lists too). To delete multiple columns use NULL instead. To add multiple empty list columns, use list(list()).
Please help.
Thanks,
- Keith