-1

I have a dataset that I have created in R. It is structured as follows: enter image description here

I am trying to cluster the observations using k-means. However, I get the following error message:

> cl <- kmeans(sample, 3)

Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In storage.mode(x) <- "double" : NAs introduced by coercion

What does this mean? Am I prepocessing the data incorrectly? What can I do to fix it?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Your picture shows a mixture of character (ID, Genre) and numeric data. The kmeans function only works with numeric data. What does `str(samples)` show? – dcarlson Dec 07 '19 at 23:02

1 Answers1

0

In the documentation of kmeans (pass ?kmeans in the console to see it), it is stipulated that the argument x has to be:

numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

Here, you have the first row that is preventing to be used for kmeans. Basically, I believed that your first row is supposed to be your colnames.

Moreover, you can't make clustering with your second columns genre as it is character and I believed that the first column does not have to be used also, am I right ?

So, if your dataset is called samples, try to do:

colnames(samples) <- samples[1,]
samples_cluster <- samples[-1,3:ncol(samples)]
cl <- kmeans(samples_cluster,3)

Does it answer your question ?

If not, can you provide a reproducible example of your dataset in order we can verify the dataframe for kmeans clustering. To do this, please see: How to make a great R reproducible example

dc37
  • 15,840
  • 4
  • 15
  • 32