103

I have a for loop which produces a data frame after each iteration. I want to append all data frames together but finding it difficult. Following is what I am trying, please suggest how to fix it:

d = NULL
for (i in 1:7) {

  # vector output
  model <- #some processing

  # add vector to a dataframe
  df <- data.frame(model)

}

df_total <- rbind(d,df)
steveb
  • 5,382
  • 2
  • 27
  • 36
Ibe
  • 5,615
  • 7
  • 32
  • 45

9 Answers9

231

Don't do it inside the loop. Make a list, then combine them outside the loop.

n = 5
datalist = list()
# or pre-allocate for slightly more efficiency
datalist = vector("list", length = n)

for (i in 1:n) {
    # ... make some data
    dat <- data.frame(x = rnorm(10), y = runif(10))
    dat$i <- i  # maybe you want to keep track of which iteration produced it?
    datalist[[i]] <- dat # add it to your list
}

big_data = do.call(rbind, datalist)
# or big_data <- dplyr::bind_rows(datalist)
# or big_data <- data.table::rbindlist(datalist)

This is a much more R-like way to do things. It can also be substantially faster, especially if you use dplyr::bind_rows or data.table::rbindlist for the final combining of data frames.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • It also works but i want to write all lists to separate columns – Ibe Apr 02 '15 at 18:05
  • @Ibe you really should edit your question then. Provide sample data and desired output (see my comment in maRtin's answer) and take `rbind` out of your question, because all `rbind` does is bind rows together. – Gregor Thomas Apr 02 '15 at 18:12
  • 2
    just replaced `rbind` in your code with `cbind`. It worked and now I have all lists in separate columns – Ibe Apr 02 '15 at 18:16
  • 2
    Similarly, if this step seems slow to you, try `dplyr::bind_cols()`, – Gregor Thomas Apr 02 '15 at 19:16
  • if you're making a large data set, worth looking at creating it using `data.table` too. i.e. if reading in .csv in the first step of the function use `fread` and not `read.csv` – DaveRGP Jun 03 '16 at 14:39
  • Hi, thank you for this post, it is really helpful. Can I as what do.call is doing? – halo09876 Nov 15 '18 at 18:50
  • 4
    `do.call(rbind, datalist)` is the way to write in a nice way `rbind(datalist[[1]], datalist[[2]], datalist[[3]], ...)` – Gregor Thomas Nov 15 '18 at 19:08
  • `datalist = vector("list", length = 5)` is better and will append each element of the list faster, as far as I know – Julien Aug 24 '22 at 15:14
  • @Julien yes, that is better practice overall--I'll edit my answer to reflect that. Though I think the differences are pretty negligible unless you're doing 10,000+ iterations. – Gregor Thomas Aug 24 '22 at 15:26
27

You should try this:

df_total = data.frame()
for (i in 1:7){
    # vector output
    model <- #some processing

    # add vector to a dataframe
    df <- data.frame(model)
    df_total <- rbind(df_total,df)
}
Stephopolis
  • 1,765
  • 9
  • 36
  • 65
maRtin
  • 6,336
  • 11
  • 43
  • 66
  • 1
    It appends frame after frame in single column. How do i append all data frames into separate columns? – Ibe Apr 02 '15 at 00:15
  • use cbind() instead of rbind() – maRtin Apr 02 '15 at 00:16
  • 1
    use of `cbind()` resulted in `Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 18262 ` – Ibe Apr 02 '15 at 00:17
  • 1
    did you reset the df_total = data.frame() before you reexecuted the loop? – maRtin Apr 02 '15 at 00:18
  • no reset and each iteration from `for loop` will give same number of rows – Ibe Apr 02 '15 at 00:19
  • cant help you without sample data, sorry. do the columns have a name? try both check.names = TRUE or check.names = FALSE – maRtin Apr 02 '15 at 00:21
  • columns don't have name. data is large in size. how do i present it here? – Ibe Apr 02 '15 at 15:21
  • @Ibe we don't need the whole thing, just `dput(head(my_data))` would be good (it looks ugly but it's copy/paste-able). Also post `dim(my_data)` since it sounds like the number of rows differ. Do that for two different values of `i`. – Gregor Thomas Apr 02 '15 at 17:59
4

Again maRtin is correct but for this to work you have start with a dataframe that already has at least one column

model <- #some processing
df <- data.frame(col1=model)

for (i in 2:17)
{
     model <- # some processing
     nextcol <-  data.frame(model)
     colnames(nextcol) <- c(paste("col", i, sep="")) # rename the comlum
     df <- cbind(df, nextcol)
}
jwilley44
  • 173
  • 6
  • 1
    it works but appends data in a single column. I want all data frames to be in separate columns – Ibe Apr 02 '15 at 04:00
4

Try to use rbindlist approach over rbind as it's very, very fast.

Example:

library(data.table)

##### example 1: slow processing ######

table.1 <- data.frame(x = NA, y = NA)
time.taken <- 0
for( i in 1:100) {
  start.time = Sys.time()
  x <- rnorm(100)
  y <- x/2 +x/3
  z <- cbind.data.frame(x = x, y = y)

  table.1 <- rbind(table.1, z)
  end.time <- Sys.time()
  time.taken  <- (end.time - start.time) + time.taken

}
print(time.taken)
> Time difference of 0.1637917 secs

####example 2: faster processing #####

table.2 <- list()
t0 <- 0
for( i in 1:100) {
  s0 = Sys.time()
  x <- rnorm(100)
  y <- x/2 + x/3

  z <- cbind.data.frame(x = x, y = y)

  table.2[[i]] <- z

  e0 <- Sys.time()
  t0  <- (e0 - s0) + t0

}
s1 = Sys.time()
table.3 <- rbindlist(table.2)
e1 = Sys.time()

t1  <- (e1-s1) + t0
t1
> Time difference of 0.03064394 secs
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
Ayushi Kachhara
  • 159
  • 1
  • 9
2

In the Coursera course, an Introduction to R Programming, this skill was tested. They gave all the students 332 separate csv files and asked them to programmatically combined several of the files to calculate the mean value of the pollutant.

This was my solution:

  # create your empty dataframe so you can append to it.
  combined_df <- data.frame(Date=as.Date(character()),
                    Sulfate=double(),
                    Nitrate=double(),
                    ID=integer())
  # for loop for the range of documents to combine
  for(i in min(id): max(id)) {
    # using sprintf to add on leading zeros as the file names had leading zeros
    read <- read.csv(paste(getwd(),"/",directory, "/",sprintf("%03d", i),".csv", sep=""))
    # in your loop, add the files that you read to the combined_df
    combined_df <- rbind(combined_df, read)
  }
Simon
  • 19,658
  • 27
  • 149
  • 217
1

For me, it worked very simply. At first, I made an empty data.frame, then in each iteration I added one column to it. Here is my code:

df <- data.frame(modelForOneIteration)
for(i in 1:10){
  model <- # some processing
  df[,i] = model
}
saleh sereshki
  • 2,402
  • 3
  • 17
  • 18
0

Here are some tidyverse and custom function options that might work depending on your needs:

library(tidyverse)

# custom function to generate, filter, and mutate the data:
combine_dfs <- function(i){
 data_frame(x = rnorm(5), y = runif(5)) %>% 
    filter(x < y) %>% 
    mutate(x_plus_y = x + y) %>% 
    mutate(i = i)
}

df <- 1:5 %>% map_df(~combine_dfs(.))
df <- map_df(1:5, ~combine_dfs(.)) # both give the same results
> df %>% head()
# A tibble: 6 x 4
       x      y x_plus_y     i
   <dbl>  <dbl>    <dbl> <int>
1 -0.973 0.673    -0.300     1
2 -0.553 0.0463   -0.507     1
3  0.250 0.716     0.967     2
4 -0.745 0.0640   -0.681     2
5 -0.736 0.228    -0.508     2
6 -0.365 0.496     0.131     3

You could do something similar if you had a directory of files that needed to be combined:

dir_path <- '/path/to/data/test_directory/'
list.files(dir_path)

combine_files <- function(path, file){
  read_csv(paste0(path, file)) %>% 
    filter(a < b) %>% 
    mutate(a_plus_b = a + b) %>% 
    mutate(file_name = file) 
}

df <- list.files(dir_path, '\\.csv$') %>% 
  map_df(~combine_files(dir_path, .))

# or if you have Excel files, using the readxl package:
combine_xl_files <- function(path, file){
  readxl::read_xlsx(paste0(path, file)) %>% 
    filter(a < b) %>% 
    mutate(a_plus_b = a + b) %>% 
    mutate(file_name = file) 
}

df <- list.files(dir_path, '\\.xlsx$') %>% 
  map_df(~combine_xl_files(dir_path, .))
sbha
  • 9,802
  • 2
  • 74
  • 62
  • 1
    It seems strange to name a function `combine_files` when it operates on a single file and doesn't combine it with anything... – Gregor Thomas Apr 15 '20 at 13:32
-1
x <- c(1:10) 

# empty data frame with variables ----

df <- data.frame(x1=character(),
                     y1=character())

for (i in x) {
  a1 <- c(x1 == paste0("The number is ",x[i]),y1 == paste0("This is another number ", x[i]))
  df <- rbind(df,a1)
}

names(df) <- c("st_column","nd_column")
View(df)

that might be a good way to do so....

  • 2
    I didn't vote on this post (only edited it for format), but see some issues with it, both from a code standpoint and from an answer standpoint. For the code, `<-` does not name parameters in a function call (e.g. `c()`) but is the assignment operator (named parameters use `=`). From an answer standpoint, calling `rbind` in the loop is essentially the same as Simon's answer. – Matthew Lundberg Jun 10 '20 at 05:05
-2

"""Produce Multiple DataFrames from the unique TFs via a Groupby"""

        i=0
        dfs_list=[]
        for i in range(i,len(df_CDL)):
            df = df_CDL[i]
            print(df,'Only 1 df_CDL')

            dfs= []
            for _, dataframe in df.groupby('TFs'):
                print('What is going on here?15',dataframe)
                dfs.append([dataframe])
            dfs_list.append([dfs])

#Index any dataframe you want or loop through them all. Whatever..

       print('Test?10', dfs[1], 'Test?20')
       print('What is going on here? 1', dfs_list[5], 'What is 
       going on here again? 2')