How to add data to R data frame

Question

I can't imagine it should be that difficult, but probably, coming from Python, my mindset is biased.

I know I'm going to carry out 50 calculations and the result of each calculation, together with two parameters characterizing the calculation, should build up a data frame.

So my approach is to instantiate the data frame and then I want to add the results whenever they become available. Please see the indicated row below:

# Number of simulations
nsim = 50

# The data frame which should carry the calculation (parameters and solutions).
sol <- data.frame(col.names=c("ni", "Xbar", "n"))

# Fifty values for n.
n <- seq.int(5, 5000, length.out=nsim)

for(ni in n)
{
    # A random sample containing possible duplicates.
    X <- sample(seq(-ni, ni, length=ni+1), replace=T)
    Xbar <- round(mean(X), 3)
    sol <- rbind(sol, c(ni, Xbar, n))  # <<-- How to do this correctly??
}

This doesn't work.

score 3 · Accepted Answer · answered Jul 30 '14 at 11:01

There are two ways to do this correctly. One is to pre-define your data.frame (its size) and then populate it iteratively in a for-loop:

nsim <- 10 # reduce to 10 to simplify output
n <- seq.int(5, 5000, length.out=nsim)

sol <- setNames(data.frame(matrix(nrow=nsim, ncol=3)), c("ni", "Xbar", "n"))

set.seed(1) # for reproducibility
for(ni in seq_along(n)) {
    Xbar <- round(mean(sample(seq(-n[ni], n[ni], length=n[ni]+1), replace=T)), 3)
    sol[ni,] <- c(ni, Xbar, n[ni])
}

Alternatively, you can use sapply on your n vector to create a vector of results and then cbind everything back together:

set.seed(1) # for reproducibility
sol <- data.frame(
    ni = seq_along(n),
    Xbar = sapply(n, function(ni) {
        round(mean(sample(seq(-ni, ni, length=ni+1), replace=T)), 3)
    }),
    n = n
)

Either way, you'll end up with a nice dataframe:

> str(sol)
'data.frame':   10 obs. of  3 variables:
 $ ni  : num  1 2 3 4 5 6 7 8 9 10
 $ Xbar: num  0.667 -0.232 -14.599 -26.026 36.51 ...
 $ n   : num  5 560 1115 1670 2225 ...

@thelatemail Correct. Fixed. Feel free to just edit that kind of stuff on my answers in the future. :) — Thomas, Jul 30 '14 at 11:04
Ok so I need to instantiate the data frame with its size from the start. — TMOTTM, Jul 30 '14 at 11:12
@TMOTTM You don't *have* to do it, but it is many times more efficient to initialize and fill rather than repeatedly `rbind` because the `rbind` will copy the data.frame in memory each time. — Thomas, Jul 30 '14 at 11:29

score 1 · Answer 2 · edited May 23 '17 at 12:11

1) Check what your initial sol contains.

> sol <- data.frame(col.names=c("ni", "Xbar", "n"))
> sol
  col.names
1        ni
2      Xbar
3         n

Not what you want. See this question.

2) Make sure seq.int does what you expect - check the documentation of (or just the output of) seq.int. e.g. look at what n contains:

> n
 [1]    5.0000  106.9388  208.8776  310.8163  412.7551  514.6939  616.6327
 [8]  718.5714  820.5102  922.4490 1024.3878 1126.3265 1228.2653 1330.2041
[15] 1432.1429 1534.0816 1636.0204 1737.9592 1839.8980 1941.8367 2043.7755
[22] 2145.7143 2247.6531 2349.5918 2451.5306 2553.4694 2655.4082 2757.3469
[29] 2859.2857 2961.2245 3063.1633 3165.1020 3267.0408 3368.9796 3470.9184
[36] 3572.8571 3674.7959 3776.7347 3878.6735 3980.6122 4082.5510 4184.4898
[43] 4286.4286 4388.3673 4490.3061 4592.2449 4694.1837 4796.1224 4898.0612
[50] 5000.0000

Is that what you meant?

3) Given (1) the problems are not surprising, but in any case, just carry out the first time through the loop a line at a time. See what happens:

sim = 50
sol <- data.frame(col.names=c("ni", "Xbar", "n"))
ni=5
X <- sample(seq(-ni, ni, length=ni+1), replace=T)
Xbar <- round(mean(X), 3)
sol <- rbind(sol, c(ni, Xbar, n))  
print(sol)

Gives:

Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 5) :
  invalid factor level, NA generated
>     print(sol)
  col.names
1        ni
2      Xbar
3         n
4      <NA>

Now the behavior is unsurprising; we can't add three columns to something with one column.

4) You don't want to do it this way anyway. It's better to initialize sol to be its final size and then fill it in.

See, for example, this answer

However, the more common R idiom would be to avoid loops where possible; there are a number of functions that will let you create the whole thing at once.

1-3) I get that. 4) The purpose of the question is exactly to learn how to do that "filling in". That's my main problem of understanding. — TMOTTM, Jul 30 '14 at 10:30
3) I assume that by defining `col.names`, I'm defining the number of columns, am I not? — TMOTTM, Jul 30 '14 at 10:38
You're not. You're defining a variable called `col.names`, which is why you get one column, not 3. `data.frame` doesn't have a `col.names` argument. See `?data.frame`. I've added a link to my answer to clarify the issue in your first comment there. — Glen_b, Jul 30 '14 at 11:08
You're right, misread the 'row.names' argument in `?data.frame`. — TMOTTM, Jul 30 '14 at 11:10

score 0 · Answer 3 · answered Jul 30 '14 at 11:21

First of all, can you clarify the expected output format that you expect? As of now, on modifying the code to generate a data frame, the following output will be generated (let me know if this is what you expect & then its not difficult to generate the following) :

ni       Xbar     n
10.000   2.182   12.000

If this is what you expect, then one way to do this would be as follows:

Step 1: Create Vectors

Step 2: Create Data frame from the above vectors

Step 3: Run your operations in a loop & fill in row by row.

nsim=50
n=seq.int(5, 5000, length.out=nsim)
ni<-vector(mode='numeric',length=nsim)
Xbar<-vector(mode='numeric',length=nsim)
out<-data.frame(ni=ni,Xbar=Xbar,n=n)

for ( i in 1:length(n)){
  X<- sample(seq(-n[i], n[i], length=n[i]+1), replace=T)
  out[i,'Xbar'] <- round(mean(X), 3)
  out[i,'ni']<-n[i]
}

The output is as follows:

enter image description here

How to add data to R data frame

3 Answers3