I am working on data and want to produce an Anomaly Detection model for this data. The data contains only three features: Latitude
, Longitude
and Speed
. I normalized it and then applied t-SNE
then normalized again. There is no labeled or target data. So, it should be an unsupervised anomaly detection.
I cannot share the data since it is private. But, it seems like this:
There are some abnormal values in the data such as abnormal values:
Here's the final shape of the data:
As you can see, the data is a bit complicated. When I searched for abnormal instances manually (by looking at feature values), I observed that the instances inside the red circle (in the below image) should be detected as anomalies.
The instances inside the red region should be abnormal:
I used OneClassSVM
to detect anomalies. Here are the parameters;
nu = 0.02
kernel = "rbf"
gamma = 0.1
degree = 3
verbose = False
random_state = rng
And the model;
# fit the model
clf = svm.OneClassSVM(nu=nu, kernel=kernel, gamma=gamma, verbose=verbose, random_state=random_state)
clf.fit(data_scaled)
y_pred_train = clf.predict(data_scaled)
n_error_train = y_pred_train[y_pred_train == -1].size
Here is what I obtained at the end:
Here is the detected anomalies of OneClassSVM
and red instances were detected as anomalies:
So, as you can see, the model predicted many instances as anomalies, but in reality, most of these instances should be normal.
I tried different parameter values for nu
, gamma
and degree
. However, I could not find a suitable decision line to detect only real anomalies.
- What is wrong with my model? Should I try a different anomaly detection algorithm?
- Is not my data appropriate for anomaly detection?