1

I have tried to run in both jupyter notebook and colab but still getting this error for fcmeans. But it's working fine in different laptop. This is the code used to split dataset

# Stratified Sampling using Scikit-learn's Stratified Shuffle Split Class
from sklearn.model_selection import StratifiedShuffleSplit

split = StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=42)

for train_index, test_index in split.split(data1, data1["class"]):
    strat_train_set = data1.loc[train_index]

    strat_test_set = data1.loc[test_index]
    train_set = strat_train_set.drop("class", axis=1) # drop labels for training set

train_labels = strat_train_set["class"].copy()
test_set = strat_test_set.drop("class", axis=1) # drop labels for testing set
test_labels = strat_test_set["class"].copy()

What am I missing here then?

enter image description here

molbdnilo
  • 64,751
  • 3
  • 43
  • 82
partho
  • 69
  • 11
  • 2
    Please don't post images and create a complete example. We don't know where from `tr_set` is coming. We can guess the type of that, but better if we don't have to. It makes answering easier. Also, it seems you should look into the datatype of that to further investigate. – user2736738 Sep 09 '21 at 07:15
  • Thanks for suggesting. The dtype is int64 – partho Sep 09 '21 at 07:31
  • Best would be to share the colab notebook so that I can run it. – user2736738 Sep 09 '21 at 07:41
  • Here's the drive link https://drive.google.com/drive/folders/1yVz26NsvkZK9Tf_mbuOLBhJaqQkjGZGH?usp=sharing – partho Sep 09 '21 at 12:18

1 Answers1

2

The problem here is that, tr_set is not an numpy.ndarray. So all you need to do is pass the dataframe as numpy array.

In your case if use to_numpy function before passing the data to fit (like this fcm.fit(tr_set.to_numpy())) it would work.

This was pretty clear from the fcm documentation.

user2736738
  • 30,591
  • 5
  • 42
  • 56
  • Again getting error in jupyter notebook while working well on colab. ValueError Traceback (most recent call last) in 5 clf2 = DecisionTreeClassifier() 6 clf = GridSearchCV(clf2, parameters, scoring = 'balanced_accuracy', cv=10) ----> 7 res1=clf.fit(tr_fin, train_labels) 8 res1.best_estimator_ – partho Sep 10 '21 at 15:57
  • @Aka001 If it's different error maybe make a new question and tag me or let me know of the link. The question should be concise, directly pointing to the error with an Minimal complete verifiable example. – user2736738 Sep 10 '21 at 19:44
  • Thanks for reply. Actually, It was a fitting error due to max no. of FCM(n_cluster) was selected. Suppose, if there are 5 columns in a dataset then "FCM(n_clusters=4)" or should be fixed below than approx 70% of orginal columns. Here "FCM(n_clusters=4)" worked well in colab but failed to fit with scikit classifier in jupyter notebook. So I changed "FCM(n_clusters=4)" to "FCM(n_clusters=2)" and worked finally! I don't know the actual reason when I was trying run the same code with other dataset in colab it was asking exactly same original column size for "FCM(n_clusters=)" function. – partho Sep 10 '21 at 20:17
  • @Aka001 Then it;s different from the initial error you got. In any case in future if relevant problems arise ask in a separate question. The probability of getting an answer is much higher that way. – user2736738 Sep 10 '21 at 20:18
  • Yeah, will remember that. – partho Sep 10 '21 at 20:21
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/236999/discussion-between-aka001-and-user2736738). – partho Sep 11 '21 at 11:15