0

So I have been using DBSCAN for my dataset in order to find and remove outliers, I'm using google COlab which gives me about 25.63 GB of memory. I need to set my eps value over 2.0 but as soon as I go over 2.0 the code uses all memory and it crashes. I really don't know what to do to fix it, I will attatch my code and for toe error messgage, I get none Google COlab just says restarting because it ran out of memory

My code:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.cluster import DBSCAN



df = pd.read_csv('Final After Simple Filtering.csv',index_col=None,low_memory=True)


# Dropping columns with low feature importance
del df['AmbTemp_DegC']
del df['NacelleOrientation_Deg']
del df['MeasuredYawError']





DBSCAN = DBSCAN(eps = 2.0, min_samples =100).fit_predict(df)

labels = DBSCAN.labels_


n_clusters_ = len(set(labels))
n_noise_ = list(labels).count(-1)
print('Estimated number of clusters: %d' % n_clusters_)
print('Estimated number of noise points: %d' % n_noise_)


Thank you.

alift
  • 1,855
  • 2
  • 13
  • 28
AliY
  • 557
  • 9
  • 27
  • 1
    Can you tell which version of Scikit learn you are using? – alift Mar 04 '20 at 02:17
  • 2
    Does this answer your question? [scikit-learn DBSCAN memory usage](https://stackoverflow.com/questions/16381577/scikit-learn-dbscan-memory-usage) – alift Mar 04 '20 at 02:19
  • @alift I'm using version 0.22.1 , I will read the linked page now. Thank you very much. – AliY Mar 04 '20 at 02:29
  • 1
    @alift So I've read the page and there were a few suggested solutions, one of which is using ELKI which I do not prefer. I will try changing the algorithim to "kd_tree" or change the metrics to "haversine" , if any of those do not work they have suggest to use OPTICS which is similar to DBSCAN although I would prefer to find a solution and use DBSCAN – AliY Mar 04 '20 at 02:40

0 Answers0