loop through columns of data frame in r

Question

I have the following problem:

levelsvar <- c("arrears", "expenses", "warmhome", "telephone", "colorTV", "washer", "car", "meatfish", "holiday")

variables <- NULL

for (i in 1:length(levelsvar)) {

variables <- sapply(levelstest, function(x) (length(test$levelsvar[i][test$country==x & test$levelsvar[i]=="1"]) + length(test$levelsvar[i][test$country==x & test$levelsvar[i]=="2"])) / length(test$levelsvar[i][test$country==x]))

}

I want to use a for loop to perform the function you can see above 9 times for all the levels of "levelsvar". I tried it various times but I failed. I think the problem is that r reads

test$"arrears"

instead of

test$arrears

I already tried to use noquote() but it didn't help.

Do you have a solution to this problem?

Thank you in advance!

edit:

with example

levelstest <- c("AT", "BE")

levelsvar <- c("arrears", "expenses", "warmhome", "telephone", "colorTV", "washer", "car", "meatfish", "holiday")

structure(list(country = c("AT", "AT", "AT", "BE", "BE", "BE"
), arrears = c(1L, 1L, 1L, 2L, 1L, 1L), expenses = c(3L, 1L, 
3L, 1L, 1L, 2L), warmhome = c(1L, 2L, 2L, 1L, 1L, 1L), telephone = c(4L, 
1L, 4L, 4L, 3L, 3L), colorTV = c(2L, 1L, 3L, 4L, 3L, 1L), washer = c(4L, 
1L, 3L, 3L, 1L, 2L), car = c(4L, 4L, 4L, 4L, 3L, 2L), meatfish = c(2L, 
1L, 1L, 4L, 1L, 1L), holiday = c(2L, 2L, 1L, 3L, 4L, 2L)), .Names = c("country", 
"arrears", "expenses", "warmhome", "telephone", "colorTV", "washer", 
"car", "meatfish", "holiday"), row.names = c(NA, 6L), class = "data.frame")

Now I tried

variables <- NULL

for (i in 1:length(levelsvar)) {

variables <- sapply(levelstest, function(x) (length(test[levelsvar[i]][test$country==x & test[levelsvar[i]]=="1"]) + length(test[levelsvar[i]][test$country==x & test[levelsvar[i]]=="2"])) / length(test[levelsvar[i]][test$country==x]))

  }

but this doesn't work.

hm.. it doesn't work. r returns: `Fehler in `[.data.frame`(test[levelsvar[1]], test$country == "AT") : undefined columns selected` could it be because i would need `> head(test[2][test[1]=="AT"]) [1] 1 2 1 2 1 1` instead of `> head(test[levelsvar[1]]) arrears 1 1 2 1 3 1 4 2 5 1 6 1` ah.. sorry, the formatting is different in the comment section. — r-newbie, Jul 23 '15 at 14:59
Can't tell without a reproducible example http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 — kasterma, Jul 23 '15 at 15:01
i have the solution now! thank you very much @kasterma for your first comment, that was very helpful! — r-newbie, Jul 23 '15 at 15:43
The example is not helpful because we don't know your object `test`. Would you please add `dput(head(test))` to your question and explain (verbally) what you want to achieve? The function you tried to define (use `{` instead of `(` for the function body!) doesn't look very useful; probably you mean `sum` instead of `length`. But this is hard to tell when knowing neither your data nor what you actually want to achieve. — CL., Jul 23 '15 at 15:44
@r-newbie The example helps though as user2706569 points out you forgot to name `test`. More important now is that you add an answer explaining the solution, and accept it. The solution (as you rightly pointed out) is not complete in my comment. — kasterma, Jul 23 '15 at 16:04

score 0 · Answer 1 · answered Jul 23 '15 at 16:56

What I wanted to achieve is to get the percentage for (length(test$arrears[test$country==x & test$arrears=="1"]) + length(test$arrears[test$country==x & test$arrears=="2"])) / length(test$arrears[test$country==x])) for all the levels of levelsvar (with values 1 and 2) and all countries in levelstest.

The solution to my problem is the following:

test <- (structure(list(country = c("AT", "AT", "AT", "BE", "BE", "BE"
), arrears = c(1L, 1L, 1L, 2L, 1L, 1L), expenses = c(3L, 1L, 
                                                 3L, 1L, 1L, 2L), warmhome = c(1L, 2L, 2L, 1L, 1L, 1L), telephone = c(4L, 
                                                                                                                      1L, 4L, 4L, 3L, 3L), colorTV = c(2L, 1L, 3L, 4L, 3L, 1L), washer = c(4L, 
                                                                                                                                                                                           1L, 3L, 3L, 1L, 2L), car = c(4L, 4L, 4L, 4L, 3L, 2L), meatfish = c(2L, 
                                                                                                                                                                                                                                                              1L, 1L, 4L, 1L, 1L), holiday = c(2L, 2L, 1L, 3L, 4L, 2L)), .Names = c("country", 
                                                                                                                                                                                                                                                                                                                                    "arrears", "expenses", "warmhome", "telephone", "colorTV", "washer", 
                                                                                                                                                                                                                                                                                                                                    "car", "meatfish", "holiday"), row.names = c(NA, 6L), class = "data.frame"))

levelsvar <- c("arrears", "expenses", "warmhome", "telephone", "colorTV", "washer", "car", "meatfish", "holiday")

levelstest <- c("AT", "BE")

variables <- NULL

for (i in 1:length(levelsvar)) {

variables <- cbind(variables, sapply(levelstest, function(x) (length(test[levelsvar[i]][test[1]==x & test[levelsvar[i]]=="1"]) + length(test[levelsvar[i]][test[1]==x & test[levelsvar[i]]=="2"])) / length(test[levelsvar[i]][test[1]==x])))

  }

Great; you want to point out the interesting parts of the solution though. First is the change test$levelsvar[i] to test[levelsvar[i]], since the indexing of data frames works for strings. Second is the change test$country to test[1] since the first gets the vector that is the country column, the second gets the data frame with all columns other than the first removed. So for better readability you could use test["country"] for that. — kasterma, Jul 24 '15 at 05:59

score 0 · Answer 2 · answered Jul 23 '15 at 20:29

All you need is test and this:

apply(test[-1],MARGIN = 2,function(x){
  tapply(x,test$country,function(y){
    sum(y %in% c(1,2))/length(y)
  })
})

apply() with margin = 2 will go along your columns, and tapply() will calculate a custom function based on a grouping (country). It even keeps your variable names. test[-1] will skip the country column.

loop through columns of data frame in r

2 Answers2