I have a dataframe, call it A, where the columns are Question 1, Question 2, Question 3 and so on, and the rows are indexed by the person who answered the questions, Person 1, Person 2, and so on. The questions have multiple choice answers, with varying lengths. So for instance, Question 50 may have 9 possible answers (each person can only choose 1 answer). So for instance, the entries in the column under Question 50 are numbers ranging from 1 to 9.
In order to do some PCA on this dataset, I need to convert these columns to binary form. So for instance, column Question 50 will be converted to 9 different columns: Q501, Q502, Q503....Q509. Then, Q50i,row K, will be 1 if Person K answered i to question 50, and 0 otherwise. In other words, I am making my columns indicator vectors for which response was given by which person.
I want to write a function that takes as input a column and does this binary coding of my dataset. I can do this for specifically one column, but when I try to convert the same syntax to a function (so I can apply the function to a range of columns) R can't seem to evaluate my variable. Since I have 122 columns to convert, a function really is necessary.
Here is what worked, for a specific column (50 in this case):
for (i in 1:max(A["Q50"])) {
A[paste0("Q50",i)] <- ifelse( A["Q50"]==i,1,0)
}
Here is the function that I tried to write but didn't work:
binarize <- function(column) {
for (i in 1:max(A["column"])) {
A[paste0("column",i)] <- ifelse( A["column"]==i,1,0)
}
}
The error I get is:
Error in `[.data.frame`(zip.lingLoc, "column") :
undefined columns selected
Traceback:
4 stop("undefined columns selected")
3 `[.data.frame`(zip.lingLoc, "column")
2 zip.lingLoc["column"]
1 binarize("Q053")
Here is an example:
A is the following dataframe.
ID Q050
1 1 4
2 2 4
3 3 4
4 4 7
5 5 8
6 6 8
7 7 7
8 8 4
9 9 7
10 10 7
Now I apply the thing that works:
for (i in 1:max(A["Q050"])) {
A[paste0("Q050",i)] <- ifelse( A["Q050"]==i,1,0)
}
And A becomes:
ID Q050 Q050 Q050 Q050 Q050 Q050 Q050 Q050 Q050
1 1 4 0 0 0 1 0 0 0 0
2 2 4 0 0 0 1 0 0 0 0
3 3 4 0 0 0 1 0 0 0 0
4 4 7 0 0 0 0 0 0 1 0
5 5 8 0 0 0 0 0 0 0 1
6 6 8 0 0 0 0 0 0 0 1
7 7 7 0 0 0 0 0 0 1 0
8 8 4 0 0 0 1 0 0 0 0
9 9 7 0 0 0 0 0 0 1 0
10 10 7 0 0 0 0 0 0 1 0
Which is great, but if I apply my previous functions binarize to it, I just get the same error as I noted above.
My questions are, what is wrong with my function binarize. And is this the best way for me to do this? Thank you!