0

I am a newbie to R and I have to concatenate two lists of around 2 millions of observations and 25 variables. To be more precise, I have obtained the two lists by reading two large CSVs with the following R code lines:

require(data.table)
setwd("/Users/cart")

DT2017 <- fread("BNR_2017.csv")
DT2018 <- fread("BNR_2018.csv")

Now, I would like to concatenate the DT2017 and DT2018 lists in a single one of around 4 millions observations and 25 variables.

nkr
  • 3,026
  • 7
  • 31
  • 39
Fabio
  • 1
  • 1
  • 2
    How about just `rbind(DT2017, DT2016)` – akrun Mar 18 '19 at 11:11
  • Welcome to Stack Overflow! Please try reading up on how to ask a question, that can be answered by others: https://stackoverflow.com/help/how-to-ask. There are several ways to provide data, probably adding the output of `dput()` or `dput(head())` to your question is sufficient. Avoid adding code or alphanumeric output as images. Consider how to make a good example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and see how you can change your question accordingly. Edit: in this case, yes, do what @akrun explained. – heck1 Mar 18 '19 at 11:12
  • use library(gtools) final <- smartbind(DT2017, DT2016) – Hunaidkhan Mar 18 '19 at 11:27

1 Answers1

0

I think you'll have two dataframes in R if you use fread. However, using simple rbind might be not a very good idea when you have so many rows so I think it is better to preallocate memory in R by creating a dataframe filled with NAs first, then use a loop to 'paste' each row in the dataframe.

df <- data.frame(an = rep(NA, 4000), b1 = rep(NA, 40000), b3  = rep(NA, 40000))


df1 <- data.frame(an = seq(1:2000), b1 = seq(4001,6000), b3 = rep('abc', 2000))
df2 <- data.frame(an = seq(1:2000), b1 = seq(4001,6000), b3 = rep('abc', 2000))

### create a simple loop
for (i in 1:dim(df1)[1]) {
  print(i)
  df <- rbind(df, df1[i,])
}

Then you can do the same for the df2.

Andrei Niță
  • 517
  • 1
  • 3
  • 14
  • If you preallocate memory, it's better to do `df[i, ] <- df1[i, ]`. There is no need for `rbind`. Your code is wrong, it will **extend** `df` with rows at the bottom. Try it with smaller df's. – Rui Barradas Mar 18 '19 at 12:21