0

I am trying to perform Linear Regression on the below data:-

Need to perform Linear Regression on Air_weight and Water_weight.

Kindly let me know how to resolve this error.

This is the code i tried but got an error:-

fit <- lm(Water_Weight~Air_Weight, data=table1)

This is the data screenshot

Error

**Warning messages:
1: In model.response(mf, "numeric") :
  using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors**



ID  GENDER  Air_Weight    Water_Weight  Body_Fat
01  1       75.60               *        14.17 
02  1       70.70              3.60      13.95 
03  1         *                4.00      8.98 
04  1       95.00              4.30      17.32   
05  1       73.20              3.80      11.50 
ayhan
  • 70,170
  • 20
  • 182
  • 203
Arpitgt
  • 71
  • 1
  • 1
  • 7
  • 1
    Welcome to Stack Overflow! Can you please include data and/or code that will provide us with a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ? – Ben Bolker May 03 '16 at 16:02

2 Answers2

1

You are having some problem with the structure of your data, probably based on the way you read it into R. The most obvious issue is that you will need to use na.strings="*" as an additional argument when reading in your data (with read.csv() or read.table()), to avoid turning the Air_Weight and Water_Weight variables into factors.

There may be other problems, but they are impossible to diagnose remotely. Here's an example that shows this approach can work:

table1 <- read.table(header=TRUE,na.strings="*",text="
ID  GENDER  Air_Weight    Water_Weight  Body_Fat
01  1       75.60               *        14.17 
02  1       70.70              3.60      13.95 
03  1         *                4.00      8.98 
04  1       95.00              4.30      17.32
05  1       73.20              3.80      11.50") 

str(table1)
## 'data.frame':    5 obs. of  5 variables:
##  $ ID          : int  1 2 3 4 5
##  $ GENDER      : int  1 1 1 1 1
##  $ Air_Weight  : num  75.6 70.7 NA 95 73.2
##  $ Water_Weight: num  NA 3.6 4 4.3 3.8
##  $ Body_Fat    : num  14.17 13.95 8.98 17.32 11.5

If you are reading the data from a CSV file you should use something like:

table1 <- read.csv("my_data_file.csv",na.strings="*")

(header=TRUE is a default option for read.csv())

Notice that in the structure of the data, Air_Weight and Water_Weight are numeric (abbreviated num). This is good. We can proceed with a linear model:

fit <- lm(Water_Weight~Air_Weight, data=table1)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Even though I used it as you instructed it didnt worked. It gives same error. – Arpitgt May 03 '16 at 15:58
  • that's fine. thanks! – Ben Bolker May 03 '16 at 16:40
  • @Gregor;--thanks for the help...But i am little bit confused. If we have to add large dataset then also we have to write whole data as below:-- table1 <- read.table(header=TRUE,na.strings="*",text="whole dataset") – Arpitgt May 03 '16 at 16:57
  • 3
    This is an **illustration**. You haven't given us a reproducible example as requested, or at least shown us what command you used to read your data. This is absolutely necessary if we're going to be able to help you, as your problem is *not* with linear regression but with loading your data into R in a useful way. – Ben Bolker May 03 '16 at 16:58
  • I am sorry for inconvenience. I tried this to import data. table <- read.csv(file.choose(),na.strings="*") fit <- lm(Water_Weight~Air_Weight, data=table) this is the error > fit <- lm(Water_Weight~Air_Weight, data=table) Warning messages: 1: In model.response(mf, "numeric") : using type = "numeric" with a factor response will be ignored 2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors – Arpitgt May 03 '16 at 17:17
  • 1
    there's nothing obviously wrong with `table <- read.csv(file.choose(),na.strings="*")`. We'll probably need access to your actual file to go any farther; can you post it somewhere? The only thing short of that that *might* help is for you to post the results of `dput(table1)` (please **edit your question** to include this material rather than posting it in comments) – Ben Bolker May 03 '16 at 17:21
1

Try the following:

GENDER <- c(1,1,1,1,1)
Air_Weight <- c(75.60, 70.70, NA, 95.00, 73.20)
Water_Weight <- c(NA, 3.60, 4.00, 4.30, 3.80)
Body_Fat <- c(14.17, 13.95, 8.98, 17.32, 11.50)
ID <- c(01, 02, 03, 04, 05)
data <- data.frame(GENDER, Air_Weight, Water_Weight, Body_Fat)
data

This gives us the following:

       GENDER  Air_Weight   Water_Weight   Body_Fat
1      1       75.6          NA             14.17
2      1       70.7          3.6            13.95
3      1       NA            4.0            8.98
4      1       95.0          4.3            17.32
5      1       73.2          3.8            11.50

Then we fit it to the linear model with:

fit <- lm(Water_Weight~Air_Weight, data=data)
fit

And the output is:

lm(formula = Water_Weight ~ Air_Weight, data = data)

Coefficients:
(Intercept)   Air_Weight  
 1.7895       0.0265  
Amanda R.
  • 287
  • 1
  • 2
  • 17
  • I would also recommend looking at [this site](http://www.r-tutor.com/r-introduction/data-frame/data-import) and within R run the following: help(read.table) – Amanda R. May 03 '16 at 16:38
  • your code doesn't quite work. Did you mean `data <- data.frame(GENDER, Air_Weight, Water_Weight, Body_Fat)` ? – Ben Bolker May 03 '16 at 16:41
  • @BenBolker Yes! Oops, I changed that in my R code, but must have copied the wrong line into SO, I changed it in my answer--so it should work now. – Amanda R. May 03 '16 at 16:44
  • @AmandaR. Thanks you for the help. But in data for there is * values as shown below. ID GENDER Air_Weight Water_Weight Body_Fat 01 1 75.60 * 14.17 02 1 70.70 3.60 13.95 03 1 * 4.00 8.98 04 1 95.00 4.30 17.32 05 1 73.20 3.80 11.50 – Arpitgt May 03 '16 at 16:52
  • @Arpitgt Why are those * there? I was looking at the table above to get the data. – Amanda R. May 03 '16 at 16:55
  • @Arpitgt are you looking at a smalll amount of data or reading in your data from a .txt, .csv, etc. file? – Amanda R. May 03 '16 at 16:56
  • @AmandaR. I am looking in my CSV file. In actual data there are * in it. – Arpitgt May 03 '16 at 16:59
  • @Arpitgt are the * supposed to represent NA's or NULL's? – Amanda R. May 03 '16 at 17:14
  • @Arpitgt Answer has been edited – Amanda R. May 03 '16 at 17:19
  • @AmandaR... Yes they represent as NA but when i tried to use the CSV file. Due to * it show an error as below. It is taking that * which is present in that CSV file as factors. Means I am not able to convert that * to NA in CSV file. > fit <- lm(Water_Weight~Air_Weight, data=table) Warning messages: 1: In model.response(mf, "numeric") : using type = "numeric" with a factor response will be ignored 2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors – Arpitgt May 03 '16 at 17:20
  • @Arpitgt within your csv file use a find and replace to change the * to NA's. From your comment below it looks like having the '*' is changing the vectors of your data frame from numeric to character, which is causing the problem running the 'lm' function. Once you change the * to so it will be read in as an NA that should fix your problem – Amanda R. May 03 '16 at 17:22
  • @AmandaR. yes, i understand now after reading whole conversations..thanks alot for the help. – Arpitgt May 03 '16 at 17:25
  • 2
    in principle you shouldn't have to edit your data file; that's what `na.strings="*"` is supposed to do. On the other hand, if editing the data file is what works then go for it. – Ben Bolker May 03 '16 at 17:35
  • Oh. Good point, I forgot about that argument of the `read.table()` function because I haven't used it as much. Thanks for pointing that out @BenBolker. – Amanda R. May 03 '16 at 17:41
  • @BenBolker.. actually I dont want to edit the data. Is there any code to convert those * into NA. – Arpitgt May 03 '16 at 17:47
  • 1
    @Arpitgt look at Ben Bolker's last comment (3 above this one). That's the way to do it w/o editing your data. – Amanda R. May 03 '16 at 17:48
  • Sorry for shouting, but **at this point we really really really need to see your actual CSV file in order to have any hope of understanding what's going on** – Ben Bolker May 03 '16 at 17:50
  • @BenBolker...I added in OneDrive. https://onedrive.live.com/redir?resid=2E578912508A8F81!46917&authkey=!AEhi16uS9A0TXdc&ithint=file%2ccsv – Arpitgt May 03 '16 at 17:53