Meaning of a statement in r?

Question

I am trying to debug a code in R in order to understand it. The statements are as follows:

library(rpart)
X = read.csv("strange_binary.csv");
fit  = rpart(c ~ X + X.1 + X.2 + X.3 + X.4 + X.5 + X.6 + X.7 + X.8 + X.9, method ="class",data=X,minbucket=1,cp=.04);
printcp(fit);
fit = prune(fit,cp=.04);

pred = predict(fit,X[,1:10],type="vector")      # test the classifier on the training data
pred[pred == 2] = "bad"
pred[pred == 1] = "good"

The aim is to build a classifier and to test it on the training data. However, I do not understand the statements:

pred[pred == 2] = "bad"
pred[pred == 1] = "good"

pred==2 and pred==1 would be either TRUE or FALSE - how is it being used to index a vector? Sorry for my naive question, I am from a C++ background and taking baby steps in R.

Thanks for your help!

You can use a logical vector to select array elements. Type `?"["` for help. — G5W, Apr 13 '17 at 00:25
Try something like `x = c("a", "b", "c", "d"); x[c(FALSE, TRUE, TRUE, FALSE)]`. Logical indexing/subsetting like this is very common in R. — Marius, Apr 13 '17 at 00:28
`pred` is a vector from the result of `predict`. So it looks like the model predicts result as either 1, or 2, and that statement just changes the result to characters strings "good" and "bad" respectively. — Andrew Lavers, Apr 13 '17 at 00:28
@epi99, could you please elaborate? I am using standard function `predict()` for prediction. How does this limit the predicted values to `1` and `2` then? The `strange_binary.csv` file has values `0` and `1`. — , Apr 13 '17 at 00:31
Were you expecting different output from `predict()`? `predict` is a generic function that acts differently depending on the type of model fit you pass to it, and the `type` argument you give. I assume that for your model, getting predicted classes, i.e. only `1` and `2`, makes sense. If you want some other type of prediction, you should explore the `type` options for your specific model. — Marius, Apr 13 '17 at 00:36
@user6490375, my understanding is that predict is a generic function, that can be applied to different classes. `rpart` returns an object which know how to to do the prediction, so the the result is really determined by the specific model (rpart) and how it is set up. I dont know much about rpart specifically. — Andrew Lavers, Apr 13 '17 at 00:37
@Marius, the `strange_binary.csv` files has 0s and 1s. What I am wondering is how are the predicted values 1 and 2? — , Apr 13 '17 at 00:41

score 1 · Accepted Answer · edited Apr 13 '17 at 10:13

1

This is a way of saying: Assign the value "bad" to the subset of pred where pred is equal to 2

pred[pred == 2] = "bad"

Assign the value "good" to the subset of pred where pred is equal to 1

pred[pred == 1] = "good"

A more R-like way of assigning values would look like this:

pred[pred == 2] <- "bad"
pred[pred == 1] <- "good"

So it creates classes based on the logic of pred being equal to one or the other of those two values.

EDIT:

Because you asked in the comment what it is as well. I would recommend executing your code above a single line at a time. At each stage you can see what has changed by using: str() to see the structure of your new variable. It will give you dimensions, and types for the data with a few examples.

str(fit)
str(pred)

It will help you get a feel for what is occurring at each step.

edited Apr 13 '17 at 10:13

ikop

1,760
1
12
24

answered Apr 13 '17 at 00:30

sconfluentus

4,693
1
21
40

I don't think using `<-` for assignment is "more R-like". It's mostly personal preference, and it makes no difference in this case. – Marius Apr 13 '17 at 00:33
What exactly is `pred`? I mean to ask, which column (or row) of `pred` will be affected? And why on that one? – Apr 13 '17 at 00:34
1

I believe that most style guides still recommend `<-` for assignment. It may make no difference in most cases, but there are cases where it does. See http://stackoverflow.com/questions/1741820/assignment-operators-in-r-and. – neilfws Apr 13 '17 at 00:38
pred is the variable to which you have assigned the results of this `pred = predict(fit,X[,1:10],type="vector") ` which is the prediction results from the equation two lines prior. @marius, either way of assigning is completely acceptable and equivalent, and thus will work, it is just a convention of assignment in R. It was not the major point of my explanation, just something I thought someone new to R might need if they were not used to it and will likely find it in other explanations. – sconfluentus Apr 13 '17 at 00:39

Meaning of a statement in r?

1 Answers1