I just recently migrated from STATA to R. I am quite excited but obviously now face all these teething problems of getting used to R and the way it works.
In my work I have been using STATA for data cleaning as well. And now I intent to do the same with R. We collect lots of primary data (household level data) using questionnaires. The data sets are quite big and often consists of a number of different grids (household grids, for example, where we repeat a number of questions for each household member such as what is your name, your age, your education etc.). So you might end up with questions such as Q3_Name_1 (name for the household member 1) Q3_Age_1 (age of household member 1) Q3_Edu_1 (education of household member 1)
Q3_Name_2 (name for the household member 2)
Q3_Age_2 (age of household member 2)
Q3_Edu_2 (education of household member 2)
Q3_Name_3 (name for the household member 3)
Q3_Age_3 (age of household member 3)
Q3_Edu_3 (education of household member 3)
The structure of the data frame more or less looks as follows:
df <- data.frame(ID=c(1, 2, 3,4), Q3_A_1=c(1, 3, 2, 5),
Q3_Age_2=c(1, 4, 2, "Refused"),
Q3_Age_3=c(1, 9, 2, 4),
Q3_Age_4=c(1,11, "Don't know", 5), stringsAsFactors=F)
If I need to make changes to one question in a household grid (for example to the question what is your age), I most likely need to make similar changes to all other name-questions in the household grid. Here you best use a loop. You write the commands once and then R applies to all these questions in the grid.
I tried and failed. I still don’t get the principles that govern these loops in R. in STATA, the key piece is a placeholder, something like ‘i’ to replace the numbers at the end of each question name. What is the equivalent in R? I tied to crack the nut as follows:
i<-1; while(i<=11){
w3$Q3Age_i<-as.character(w3$Q3Age_i)
w3$Q3Age_i[w3$Q3Age_i == "Refused" | w3$Q3Age_i == "Don't Know"] <- "NA"
w3$Q3Age_i<-as.numeric(w3$Q3Age_i)
i<-i+1
}
Maybe you can also use ‘repeat’ ot the like. But at this stage, I just don’t understand how you can mae R understand that w3$Q3Age_i first refers to w3$Q3Age_1 and then to w3$Q3Age_2 etc.
Any help or hints would much appreciated!
Best,
Dom