handle large data set in R

Question

I have two large data sets , first data set is numeric , contains 60759 objects and 15 features the he second one is categorical contains 60759 objects and 9 features ,I am trying to calculate Euclidean distance for numerical data set and simple matching for categorical data set. but i could not calculate them because of the size of data.

does any one has idea how can we handle large data in R

Please reformulate your question -- see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Jean, Sep 07 '17 at 08:18
At a glance, 60k+ rows / <20 columns doesn't seem large. How are your calculations done? Perhaps there are ways to make the code more efficient. — Z.Lin, Sep 07 '17 at 08:38
Have you tried the fread() function in the data.table package to read large data file? — Juanma, Sep 07 '17 at 09:27

score 0 · Answer 1 · answered Sep 07 '17 at 08:30

You can use Microsoft R Open with RevoScaleR library. RevoScaleR library is designed to handle large amount of data by breaking into smaller chunks.

Look up here:

https://learn.microsoft.com/en-us/r-server/r/concept-what-is-revoscaler https://learn.microsoft.com/en-us/r-server/r-client/what-is-microsoft-r-client

score 0 · Answer 2 · answered Sep 07 '17 at 09:29

0

you can try the paralleldDist package C++ and multithread
https://cran.r-project.org/web/packages/parallelDist/parallelDist.pdf

parDist(x, method = "euclidean")

answered Sep 07 '17 at 09:29

s.brunel

1,003
10
24

handle large data set in R

2 Answers2