I was trying to plot the elbow method chart to decide how many clusters I had to put in my k-means clustering. I was applying the following code:
import seaborn as sns
iris = sns.load_dataset('iris')
iris.columns
data = iris.iloc[:, 0:3] # selecting all lines and 3 columns
sse = {}
for k in range(1, 10):
kmeans = KMeans(n_clusters=k, max_iter=1000).fit(data)
data["clusters"] = kmeans.labels_
sse[k] = kmeans.inertia_ # Inertia: Sum of distances of samples to their closest cluster center
plt.figure()
plt.plot(list(sse.keys()), list(sse.values()))
plt.xlabel("Number of cluster")
plt.ylabel("SSE")
plt.show()
However, if I use more than 3 columns in this code:
data = iris.iloc[:, 0:3]
for example:
data = iris.iloc[:, 0:4]
the code don't run.
In addition, if I use:
data = iris[['sepal_length', 'sepal_width', 'petal_length']]
instead of:
data = iris.iloc[:, 0:3]
the code also don't run.
This two problems result in the following message:
C:\Users...\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy This is separate from the ipykernel package so we can avoid doing imports until C:\Users\ledag\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
Why is this happening?