20

training set

trainSample <- cbind(data[1:980,1], data[1:980,2]) cl <-
factor(c(data[1:980,3]))

test set

testSample <- data(data[981:1485,1], data[981:1485,2])
cl.test <- clknn

prediction

k <- knn(trainSample, testSample, cl, k = 5)

output

< k

  [1] 2 2 1 1 1 1 2 1 2 1 1 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 1 2 2 1 1 2 2 2 2 1 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2
 [60] 2 2 2 2 1 2 2 2 2 1 2 2 1 2 2 2 1 1 2 1 2 2 1 1 1 2 1 2 2 2 1 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 1 2 2 2 2 1 2 2 2 2 2 2
[119] 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 2 1 2 1 1 1 1 2 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 1 2 2 1 2 1 2 2 2 2
[178] 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 1
[237] 2 2 2 2 2 1 2 2 1 2 2 1 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 1 2 2 1 2 2 2 2 1 2 1 2 2 2 2 1 1 2 1 2 2 2 2 1 2 2 2
[296] 2 2 2 1 2 1 2 1 1 1 2 1 2 2 1 1 2 2 1 2 1 2 2 1 2 2 2 1 2 2 2 2 2 1 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 1 2 1 1 2 2 2 1 1 2
[355] 1 2 1 2 1 2 1 2 2 2 2 2 2 1 1 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2
[414] 2 2 1 2 2 2 2 2 2 2 2 2 1 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[473] 2 2 2 2 2 1 1 2 2 2 2 2 1 2 2 1 1 2 2 1 2 2 1 2 1 2 2 1 2 2 2 2 2
Levels: 1 2

I want "c" and "not-c" (like in my original data.csv), instead of 1 and 2 (im also not sure which number is supposed to represent which)

Can anyone help ?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Beginner questions
  • 343
  • 1
  • 4
  • 15

4 Answers4

32

It is very easy to change the factor levels and also not get confused about which is which:

Example data:

> a <- factor(rep(c(1,2,1),50))
> a
  [1] 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2
 [75] 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1
[149] 2 1
Levels: 1 2

#this will help later as a verification
#this counts the instances for 1 and 2
> table(a)
a
  1   2 
100  50 

So as you can see above the order of the levels is 1 first and 2 second. When you change the levels (below) the order remains the same:

#the assignment function levels can be used to change the levels
#the order will remain the same i.e. 'c' for '1' and 'not-c' for '2'
levels(a) <- c('c', 'not-c')

> a
  [1] c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c    
 [25] c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c    
 [49] c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c    
 [73] c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c    
 [97] c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c    
[121] c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c     c     not-c c    
[145] c     not-c c     c     not-c c    
Levels: c not-c

And this is the verification:

> table(a)
a
    c not-c 
  100    50 
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • you answered a question for me about ensemble, I am struggling to find another classifier to use, and also how do I factor the results from each classifier i.e. how do I factor this "h_results$TrueLabel==nb_results$Prediction" or does the classifier do it for me? sorry about this. – Beginner questions Apr 19 '15 at 20:50
  • No problem, happy to help :). For the combine classifier you mean? As I said you can use which ever you like. I don't know why you would like to factor `h_results$TrueLabel==nb_results$Prediction` (maybe to use `table` later?) but assuming that their lengths are equal just wrap it inside a `factor` function and that's it. – LyzandeR Apr 19 '15 at 21:05
  • I would suggest if your question is more deep and you need more info about something to ask a new question (adding a reproducible example and a wanted output) and people will help. You can also send me the link here to attempt answering it myself if you like. (I am saying this because it is difficult to try and solve a problem through chatting in the comments) – LyzandeR Apr 19 '15 at 21:08
  • thanks for your replies, yeah that code gives me a true false result for each data sample e.g. something like TRUE TRUE TRUE FALSE TRUE etc etc as to if its correct or not do I need to factor that and the other classifiers as well and then combine them all together in one table and classify that table with the true label also in there? (I hope what i just said made any sense) haha – Beginner questions Apr 19 '15 at 21:08
  • Comparing the true labels with the predicted is only for measuring the accuracy. In the combined model you need to only use the predicted values. – LyzandeR Apr 19 '15 at 21:11
  • I probably should post a new question with main code and stuff i suppose – Beginner questions Apr 19 '15 at 21:11
  • It is not very clear what you are asking :), because of the small space here. It might be better to add another question yes with more details. – LyzandeR Apr 19 '15 at 21:12
  • Ok I see what you are saying, so here is my prediction code predict(h_fit,alldata[testSample,], type="class" ) Do I need to factor this result like factor(predict(h_fit,alldata[testSample,], type="class" )) or something? or does a classifier do it – Beginner questions Apr 19 '15 at 21:13
  • Right now it makes sense. It really depends how the predict method works for each model you use. Use `str` to see whether `predict(h_fit,alldata[testSample,], type="class" )` is a factor and if not it is probably better to convert it the way you show me. – LyzandeR Apr 19 '15 at 21:16
  • Thanks a lot, I will experiment some more. – Beginner questions Apr 19 '15 at 21:19
  • No problem. Feel free to ask me anything else you want or post a link to a question you have asked. (btw you now have the priviledge to upvote as well :P - get the community to grow) – LyzandeR Apr 19 '15 at 21:21
6

Subscripted assignment also works. For example, here's a factor:

> a <- factor(sample(letters[1:5],100,replace=T))
> a
  [1] a d d d d a d d a b a b e a c d a c a a b e e d a e d e e a a c a a a b a
 [38] b b a a e b d b c a a a b e b c e d d b b c c a b a d c b c c d e b d e d
 [75] a a a b e e c b c b c c d d e e d a e e e b c e b e
Levels: a b c d e

Now, let's give a couple of those levels new names:

> levels(a)[c(2,4)] <- c('y','z')
> a
  [1] a z z z z a z z a y a y e a c z a c a a y e e z a e z e e a a c a a a y a
 [38] y y a a e y z y c a a a y e y c e z z y y c c a y a z c y c c z e y z e z
 [75] a a a y e e c y c y c c z z e e z a e e e y c e y e
Levels: a y c z e
cbare
  • 12,060
  • 8
  • 56
  • 63
2

You can do something like this:

x<-factor(c(1,1,2,3,1), labels=c("group1","group2","group3")) 
> x 
[1] group1 group1 group2 group3 group1 
Levels: group1 group2 group3

Or like this:

train <- read.csv("train.csv", header=TRUE)[1:1000, ]
labels <- train[,1]
Chris
  • 1,692
  • 2
  • 17
  • 21
  • This kind of works but how do I know which one is which from my original dataset? I can swap the names round and the result will be reversed but the accuracy remaining the same so that's obviously not right? Thanks – Beginner questions Apr 17 '15 at 23:48
  • I cant get your second part to work, I don't really understand it, I tried this newPred <- knn(train, traintest, labels, k = 5) get this error Error in knn(train, traintest, labels, k = 5) : NA/NaN/Inf in foreign function call (arg 6) – Beginner questions Apr 18 '15 at 00:47
2

use forcats package.

a <- factor(rep(c(1,2,1),50))

fct_collapse(a,c = c("1"),`not-c` = c("2"))
dondapati
  • 829
  • 6
  • 18
  • 2
    Nice. You don't need the `c()`s for one item: `fct_collapse(a, c = "1", \`not-c\` = "2")` – jtr13 Feb 20 '18 at 19:49