0

I'm using R (with RStudio) to solve a little classification problem. This is my problem: I have this dataset (the image represents the content of the window 'Environment' of RStudio):

enter image description here

As you can see, I have the dataset c.data that includes 3 variables (2 of these have type Factor with 2 levels: 0 and 1). I want to do a plot to plot the variable mpg as x and vs as y.I give R the following command:

> plot(c.data$mpg, c.data$vs)

And this is the result:

enter image description here

I don't understand why the values associated with the variable vs are set automatically by R as values having "1.0" or "2.0" (I'm fine with them being displayed this way, as a binary variable, but I don't understand why if they assume "0" or "1" as values they should be put in the graph in the wrong values 1 and 2). In this way if I try to plot the line of logistic regression I get a very bad thing:

enter image description here

Why this thing happening? How can I have the REAL values of vs on the y axis (ie 0.0 and 1.0) in order to plot a coherent (logistic) line?

claudioz
  • 1,121
  • 4
  • 14
  • 25
  • There is a difference between labels and levels. vs has 2 levels, 0 and 1, however it has 2 different labels, 1 and 2. – user2974951 Sep 09 '19 at 13:00
  • @user2974951 that's backwards. The *labels* are 0 and 1, but factor *levels* as stored internally always start with 1 and go up from there. But the labels are not used on the y axis by default. – Gregor Thomas Sep 09 '19 at 13:03
  • When you use a factor as a numeric, the default will always be that the 1st level is 1, the 2nd level is 2, etc. In this case, you want to use the numbers in the factor labels instead, 0 and 1. The simplest is to convert the factor to numeric--perhaps as a new column in your data fame. The linked FAQ covers that. – Gregor Thomas Sep 09 '19 at 13:09

0 Answers0