-4

I am trying to cluster a Data Frame but when I run the dist Function I get the error "NAs introduced by coercion".

Error message

At first I thought if was becuase my DF contained factor vectors, like this:

Data Frame

but I then made a new DF with just numeric values and had the same error message:

New DF

So I am not sure why I am getting this error message, what is it I am not seeing ?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Gordon
  • 73
  • 7
  • 1
    Can you provide a [reproducable example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? And I highly recomend giving the code in text and not in a png. – Qaswed Jun 09 '16 at 10:05
  • Sure I am running the the following command : `code` distances = dist(imputedTrainNoQuestions[, 2:5], method = "euclidean") `code` on my dafa frame that contains only numerical values with no NAs – Gordon Jun 09 '16 at 10:08
  • 1
    I mean to provide code that is really reproducible. If anyone else then you runs `distances = dist(imputedTrainNoQuestions[, 2:5], method = "euclidean") `, they'll get an error, because `imputedTrainNoQuestions` is not in their Workspace. Can you run `dput(imputedTrainNoQuestions[sample(1:5568, size = 50),])` and post the results als text (not as png!)? – Qaswed Jun 09 '16 at 11:25

1 Answers1

1

Euclidean distance on factor data is nonsense.

No wonder it does not work!

Albeit the error will go away if you encode the data as numbers, the results will remain nonsense.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Well I got rid of the error, there was a mistake in my code and clustered using Euclidean distance and my model works fine. – Gordon Jun 13 '16 at 11:17
  • It may *run* but the results are not statistically meaningful! Be careful! – Has QUIT--Anony-Mousse Jun 13 '16 at 15:05
  • Why not if I am using to cluster a data set that I am then running a logistical regression on? – Gordon Jun 13 '16 at 15:15
  • Most of the vectors are yes no responses – Gordon Jun 13 '16 at 15:18
  • Even for binary data, clustering is usually rather meaningless. There are "frequent combinations" of yes/no. But these are not found by regular clustering algorithms. See all the questions here about binary data - it does not "just work". Study the results in detail, and pay attention to what does not make sense (don't just try to "explain" them - you will be seeing things that are not there). – Has QUIT--Anony-Mousse Jun 13 '16 at 17:26
  • Thanks for the heads up – Gordon Jun 13 '16 at 18:39