I have a dataset with 50+mn rows and 2 columns on which I want to apply kmeans splitting into 4 clusters. I keep running into memory issues (unexplained R-studio and PC crashes) when using kmeans. I tried using bigkmeans but am getting a std:bad_alloc error.
So next I would like to create say 5 or 10 random samples of maybe 2 mn rows of this data and run kmeans on each and put the results into a single dataframe.
There is probably a way to do this elegantly with apply or something similar but I am not familiar with that and so looking for some help.
Here is how I would do this once.
df_sample <- df[sample(nrow(df),2000000),]
k4_s1 <- kmeans(df_sample,iter.max = 50,centers = 4, nstart = 50)
I could put it in a for loop but there is probably something more efficient and any help is appreciated.