0

I have two large data sets , first data set is numeric , contains 60759 objects and 15 features the he second one is categorical contains 60759 objects and 9 features ,I am trying to calculate Euclidean distance for numerical data set and simple matching for categorical data set. but i could not calculate them because of the size of data.

does any one has idea how can we handle large data in R

Community
  • 1
  • 1
Noor
  • 365
  • 2
  • 13
  • Please reformulate your question -- see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Jean Sep 07 '17 at 08:18
  • 2
    At a glance, 60k+ rows / <20 columns doesn't seem large. How are your calculations done? Perhaps there are ways to make the code more efficient. – Z.Lin Sep 07 '17 at 08:38
  • Have you tried the fread() function in the data.table package to read large data file? – Juanma Sep 07 '17 at 09:27

2 Answers2

0

You can use Microsoft R Open with RevoScaleR library. RevoScaleR library is designed to handle large amount of data by breaking into smaller chunks.

Look up here:

https://learn.microsoft.com/en-us/r-server/r/concept-what-is-revoscaler https://learn.microsoft.com/en-us/r-server/r-client/what-is-microsoft-r-client

0

you can try the paralleldDist package C++ and multithread
https://cran.r-project.org/web/packages/parallelDist/parallelDist.pdf

parDist(x, method = "euclidean")
s.brunel
  • 1,003
  • 10
  • 24