0

I'm trying to combine two columns, Previous$Col1 and Previous$Col3, into one data.frame.

This is what I tried to do:

x<-data.frame(Previous$Col1)
y<-data.frame(Previous$Col3)
z<-merge(x,y)

And for some reason I got this error on the console:

Error: cannot allocate vector of size 24.0 Gb

What's going wrong and what should I do to fix this?

How could a data frame with two columns with 80000ish rows take up 24 GB of memory?

Thanks!

Florian
  • 24,425
  • 4
  • 49
  • 80
  • 2
    The error says that you ran out of memory. It is pretty uncommon to "merge" two vectors. What is your desired output? – lmo Jul 20 '17 at 17:22
  • https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb – AidanGawronski Jul 20 '17 at 17:23
  • Possible duplicate of [R memory management / cannot allocate vector of size n Mb](https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb) – AidanGawronski Jul 20 '17 at 17:23
  • Is this one column you are trying to merge? Might work to append and then remove dupes.... – pyll Jul 20 '17 at 17:24
  • But why did I run out of memory? I'm only trying to merge two 80000-row columns? How could that possibly be a 24 GB vector? – RaiderNAYSHUN Jul 20 '17 at 17:24
  • @RaiderNAYSHUN - That's something `Rstudio` is not good at, if you are using it. Check this [comment](https://stackoverflow.com/questions/44894649/optimizing-apply-in-r#comment76839667_44895521). – Chetan Arvind Patil Jul 20 '17 at 17:27
  • We can't know why you ran out of memory without seeing your data. It looks like `merge` on two vectors is equivalent to `expand.grid`, as seen here: ` merge(1:3, 2:4)` and `expand.grid(1:3, 2:4)`. So you would end up with a Cartesian product of the vectors, which is 6.4 billion rows with two columns. – lmo Jul 20 '17 at 17:29
  • @Imo, my desired output is just a dataframe with the two columns combined – RaiderNAYSHUN Jul 20 '17 at 17:29
  • Can you just do `cbind(x,y)`? – Mako212 Jul 20 '17 at 17:31

2 Answers2

2

You are creating a full cartesian product, which has 80000*80000 rows and two columns, that is a total of 1.28e+10 elements (about 51GB if I am correct). What are you trying to accomplish with your merge?

> x<-data.frame(a= c("a","b"))
> y<-data.frame(b= c(1,2,3))
> z<-merge(x,y)
> x
  a
1 a
2 b
> y
  b
1 1
2 2
3 3
> z
  a b
1 a 1
2 b 1
3 a 2
4 b 2
5 a 3
6 b 3

You could do data.frame(Col3 = Previous$Col3, Col1= Previous$Col1) to achieve what you want.

Florian
  • 24,425
  • 4
  • 49
  • 80
1

Try using bind_cols from the dplyr package or cbind from base R.

bind_cols(Previous$Col1,Previous$Col3)

or

cbind(Previous$Col1,Previous$Col3)

Additionally, since these columns come from the same original data.frame. select() from the dplyr package could be used:

select(Previous,Col1,Col3)