Try
library(tidyr)
df_sep <- separate(df, key, into=c("State","Zip_Code", "Age_Group", "Race", "Gender"), sep="_")
State Zip_Code Age_Group Race Gender date census
1 01 35004 10-14 + M 11NOV2001 2.934397
2 01 35004 10-14 + M 06JAN2002 3.028231
3 01 35004 10-14 + M 07APR2002 3.180712
4 01 35004 10-14 + M 02JUN2002 3.274546
5 01 35004 10-14 + M 28JUL2002 3.368380
6 01 35004 10-14 + M 22SEP2002 3.462214
7 01 35004 10-14 + M 22DEC2002 3.614694
8 01 35004 10-14 + M 16FEB2003 3.708528
9 01 35004 10-14 + M 13JUL2003 3.954843
10 01 35004 10-14 + M 07SEP2003 4.048677
Edit: Alright, in your comments you have made it clear that you really want to have a solution that loops through observations, which is an inefficient approach and for a good reason typically considered bad practice. Having expressed my objections, let me show you one approach:
First, we need to populate the dataframe with the columns. To use your approach, this would be:
Var = c("State","Zip_Code", "Age_Group", "Race", "Gender")
for(j in Var){
df <- within(df, assign(j, NA))
}
However, a more efficient approach would be:
df[, Var]<- NA
Both give:
head(df)
key date census State Zip_Code Age_Group Race Gender
1 01_35004_10-14_+_M 11NOV2001 2.934397 NA NA NA NA NA
2 01_35004_10-14_+_M 06JAN2002 3.028231 NA NA NA NA NA
3 01_35004_10-14_+_M 07APR2002 3.180712 NA NA NA NA NA
4 01_35004_10-14_+_M 02JUN2002 3.274546 NA NA NA NA NA
5 01_35004_10-14_+_M 28JUL2002 3.368380 NA NA NA NA NA
6 01_35004_10-14_+_M 22SEP2002 3.462214 NA NA NA NA NA
Now, for each observation, we want to split key
into components and fill columns 4 to 8 with the corresponding elements. This will be achieved with the following:
df[, Var] <- t(sapply(df$key, function(x) unlist(strsplit(as.character(x[1]), "_"))))
Here, sapply
loops through the elements of df$key
and passes each element as argument the the function that I have defined, and collects the result in an array.
See:
sapply(df$key, function(x) unlist(strsplit(as.character(x[1]), "_")))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "01" "01" "01" "01" "01" "01" "01" "01" "01" "01"
[2,] "35004" "35004" "35004" "35004" "35004" "35004" "35004" "35004" "35004" "35004"
[3,] "10-14" "10-14" "10-14" "10-14" "10-14" "10-14" "10-14" "10-14" "10-14" "10-14"
[4,] "+" "+" "+" "+" "+" "+" "+" "+" "+" "+"
[5,] "M" "M" "M" "M" "M" "M" "M" "M" "M" "M"
Transposing it t()
makes sure that it "fits" into the dataframe df[, Var]
, and here you see that the results are identical:
identical(df[,Var], df_sep[Var])
[1] TRUE
I assume that some of the entries in df$key
differ in their format, which is why you may want to check each value first. To do so, you can just embellish the function in the sapply
call.