-9

Let's say I want to make a data frame with a numeric column and a character column:

df<-data.frame()
for(i in 1:26) {
  df<-rbind(df, cbind(x=i, y=toString(i)))
 }
str(df)
'data.frame':   26 obs. of  2 variables:
 $ x: Factor w/ 26 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..- attr(*, "names")= chr  "x" "x" "x" "x" ...
 $ y: Factor w/ 26 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..- attr(*, "names")= chr  "y" "y" "y" "y" ...

Oops, I didn't want factors.

df2<-data.frame()
for(i in 1:26) {
   df2<-rbind(df2, cbind(x=i, y=toString(i)), stringsAsFactors=FALSE)
  }
str(df2)
'data.frame':   26 obs. of  2 variables:
 $ x: chr  "1" "2" "3" "4" ...
 $ y: chr  "1" "2" "3" "4" ...

Now everything is a character. The only way I can figure out to avoid this is by constructing separate vectors and then forming the data frame at the end:

x<-NULL
y<-NULL
for(i in 1:26) {
  x<-c(x, i)
  y<-c(y, toString(i))
 }
df3<-data.frame(x, y, stringsAsFactors=FALSE)
str(df3)
'data.frame':   26 obs. of  2 variables:
 $ x: int  1 2 3 4 5 6 7 8 9 10 ...
 $ y: chr  "1" "2" "3" "4" ...

But as you can see, this requires extra code. If you have a data frame with 20 columns, you need 20 initialization statements before the loop and 20 statements inside the loop to add to the vectors.

Is there a more concise way of accomplishing this?

Cyrus Mohammadian
  • 4,982
  • 6
  • 33
  • 62
Craig W
  • 4,390
  • 5
  • 33
  • 51
  • I think it is better to have it as `list` to avoid the type conversion – akrun Sep 14 '16 at 14:13
  • Do you have to assign the df at every step? that seems very inefficient. Why not just lapply all the steps, then `do.call(rbind, list)`? – Shape Sep 14 '16 at 14:15
  • 8
    **Never** add rows to a data.frame in a loop. This looks like a typical XY problem to me when you are not describing the actual problem rather looking help with a very bad solution. Instead of describing how you are trying to solve it, I would suggest you describe what you are actually trying to achieve. – David Arenburg Sep 14 '16 at 14:18
  • 1
    The main issue is that `cbind` coerces the union of `x` and `y` to a matrix. Every element in the matrix has to be the same type. So `x` is becoming a character string. There are a number of ways you can get around this. The "best" solution will vary depending on what it is you are actually trying to do, what your initial inputs are, and how you are managing those inputs inside the loop. @DavidArenburg is correct, we need to know more about your actual intent to give you meaningful assistance. – Benjamin Sep 14 '16 at 14:22
  • @DavidArenburg: My actual problem involves a for loop which contains a lot of operations, the result of which is ~20 summary statistics (mostly numeric but some strings). In the end I want a data frame with a row for every iteration of the loop. – Craig W Sep 14 '16 at 14:24
  • 3
    Maybe try to simplify this to a problem when you are trying to calculate 1 or two statistics for a small data set. Create an MWE and provide your desired output. See [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – David Arenburg Sep 14 '16 at 14:26
  • **a.** A loop is still not the best way to do what you're doing. **b.** `tibble::data_frame` can reference previously created variables while creating one, so you can do `tibble::data_frame(x = 1:26, y = as.character(x))`. In base, you can do the same thing in two steps: `df <- data.frame(x = 1:26) ; df$y <- as.character(df$x)` – alistaire Sep 14 '16 at 14:28
  • @alistaire or just use `I` as in `df <- data.frame(x = 1:26, y = I(as.character(1:26)))` – David Arenburg Sep 14 '16 at 14:34
  • @DavidArenburg Ooh, I like that `I` usage. – alistaire Sep 14 '16 at 14:36

2 Answers2

3

Do not do this. Growing an object in a loop is incredibly slow due to the memory management. As the comments told you, it's unlikely that you need to loop over rows at all. However, if you need to do this, you should pre-allocate vectors, assign into them and combine them into a data.frame after the loop. The reason for using separate vectors (alternatively you could use a list of vectors) in the loop is that data.frame subset assignment is also slow.

x <- integer(26)
y <- character(26)
for(i in 1:26) {
  x[i] <- i
  y[i] <- toString(i)
}

df <- data.frame(x, y, stringsAsFactors = FALSE)
str(df)
#'data.frame':  26 obs. of  2 variables:
# $ x: int  1 2 3 4 5 6 7 8 9 10 ...
# $ y: chr  "1" "2" "3" "4" ...

If you have many columns, you should at least know their classes. Then you could do this:

colclasses <- c("integer", "character")
l <- lapply(colclasses, vector, length = 26)
for(i in 1:26) {
  l[[1]][i] <- i
  l[[2]][i] <- toString(i)
}
names(l) <- c("x", "y")
df <- as.data.frame(l, stringsAsFactors = FALSE)

Edit:

Since you want something concise, consider using lapply.

l <- lapply(1:26, function(i) list(x = i, y = toString(i)))
df <- do.call(rbind.data.frame, l)
Roland
  • 127,288
  • 10
  • 191
  • 288
  • If I'm not concerned about performance (my data frame is small and memory management time will be minor in comparison to the computation inside the loop) but I am concerned about conciseness of the code, is there no way to accomplish this without a separate initialization and appending line for each column of the data frame? – Craig W Sep 14 '16 at 14:49
  • If you want concise code, you shouldn't use a `for` loop. You have been told repeatedly that there are probably better (more efficient *and* more elegant) solutions to your actual problem. PS: There is really no excuse to grow an object in a loop. You wouldn't do it in other languages. And the reason is that it is a cardinal performance sin. – Roland Sep 14 '16 at 14:52
  • In general I avoid `for` loops but in this case I see no way around it. – Craig W Sep 14 '16 at 14:53
  • 4
    Well, show what you are actually trying to do and maybe others see a way around it. – Roland Sep 14 '16 at 14:54
  • Yes, give me a second while I paste in thousands of lines of highly esoteric R code. – Craig W Sep 14 '16 at 14:59
  • 2
    I'l give you hours, days, or weeks to come up with a minimal reproducible example. – Roland Sep 14 '16 at 15:00
  • I believe that's what I have: a for loop which generates a numeric and a string at each iteration, which I would like to put into a data frame with those column types. You just have to imagine you can't generate the data frame without the for loop. – Craig W Sep 14 '16 at 15:05
  • OK? And why doesn't my answer help you achieve that? I'm not sure what you requirements are. Less lines of code? Maybe the last edit suits your needs. – Roland Sep 14 '16 at 15:10
-3

I know this will be downvoted to oblivion, but here's a solution my colleague came up with:

df<-data.frame()
for(i in 1:26) {
    df<-rbind(df, data.frame(x=i, y=toString(i), stringsAsFactors=FALSE))
}
str(df)
'data.frame':   26 obs. of  2 variables:
 $ x: int  1 2 3 4 5 6 7 8 9 10 ...
 $ y: chr  "1" "2" "3" "4" ...

Performance is probably poor but it's the kind of concise solution I was looking for.

Craig W
  • 4,390
  • 5
  • 33
  • 51