1

I have 2 data frames:

a=c("aaaa","aaaaa", "aaaaaaaa")
b=c(3,5,6)
sample1=data.frame(a,b)

a=c("bb","bbbb","bbbbbbb")
b=c(4,6,54)
sample2=data.frame(a,b)

I want to loop through the samples and pass the columns from these dataframes to some functions e.g. nchar(sample1$b)

So using what should go in the for loop to do this? The code below does not work... sorry it does work but the length of e.g. "sample1$b" string is printed

for(i in 1:2) {

   cat(nchar(eval(paste("sample",i,"$b"))))

}

Thanks

kwicher
  • 2,092
  • 1
  • 19
  • 28
  • 1
    it seems a job for two `apply`s combined. Please add a little piece of your data. – SabDeM Jul 13 '15 at 14:31
  • How did you wind up with all these different data.frame variables? In R, it's better to store related items in a list rather than as separate variables in the environment. It makes things much easier. It's always best to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) with sample input data so we can run and test your code as well as test possible solutions. – MrFlick Jul 13 '15 at 14:34
  • I am looking for a general solution with the independent data.frame variables. I would like to know if this can be done and if so how. Thanks a lot. – kwicher Jul 13 '15 at 15:14

2 Answers2

1

First, you fix the first problem, which is that your data frames aren't all in a single list by collecting them via mget:

> l <- mget(x = paste0("sample",1:2))
> l
$sample1
         a b
1     aaaa 3
2    aaaaa 5
3 aaaaaaaa 6

$sample2
        a  b
1      bb  4
2    bbbb  6
3 bbbbbbb 54

Once that problem has been remedied, you can simply use lapply on the resulting list:

> lapply(l,function(x) nchar(x[["b"]]))
$sample1
[1] 1 1 1

$sample2
[1] 1 1 2
joran
  • 169,992
  • 32
  • 429
  • 468
1

Like suggested by MrFlick, you should store the related dataframes in a list:

samples <- list(sample1, sample2)

This allows you to avoid referring to each dataframe by its name:

lapply(samples, function(smp) nchar(smp$b))

If you really want to use separate variables (you shouldn't!) you can use get to return the object by constructing its name:

for (i in 1:2) print(nchar(get(paste0("sample", i))$b))
Molx
  • 6,816
  • 2
  • 31
  • 47
  • Thank you. That is exactly what I wanted. One question: why is storing the two data.frames in a list better than storing in two seperate variable? Sorry for a lame question. – kwicher Jul 13 '15 at 18:39
  • 1
    There are a few reasons for that. 1. It makes your code more organized, since you don't need to have sample1, sample 2, sample 3... 2. You have "direct" access to functions that operate on lists, such as `lapply` and `for` loops, without having to worry on the elements' names. 3. You can move then around easily. If you write a function that gets all dataframes as arguments, you can pass only a list to it instead of several different objects. – Molx Jul 13 '15 at 23:15